Visualizing Public Data: All about Traditional Markets in Seoul

SumMarket
8 min readApr 29, 2021

0. Motivation

Do you know what the convenience facilities of the traditional market are? I think you might don’t know what the market’s main products are or if there’s any parking lots nearby. It’s hard to know all of this information unless you are physically there. Traditional Markets don’t have real-time price information on the internet as large marts such as Costco, IKEA or Wal-Mart do.

To solve this inconvenience, we would like to visualize information that let us know what the main products of the traditional markets are. We used 2 kinds of public data: the latitude and the major products data of traditional markets.

1. Collecting Datasets

First, you need to collect data needed for visualization.

The latest latitude data of traditional markets can be easily found in Seoul Open Data Platform(on 2021.02.14, http://data.seoul.go.kr/dataList/OA-1176/S/1/datasetView.do;jsessionid=9C06B2D4D0F1EFDFFAF54999673D9EF8.new_portal-svr-21)

The data providing general status of traditional markets, convenience facilities, merchant organization information, and main products were found on Public Data Portals. (on 2020.02.27, https://www.data.go.kr/data/15052837/fileData.do)

df1 = pd.read_csv("data/서울시 전통시장 현황.csv", encoding='CP949')
df1.shape

2. Preprocessing Datasets — The Latitude data

Then you need to organize data by extracting what you will use and making them be able to combine with the Convenience facility data.

2.1 Load Datasets

Load the data into an encoded format and put it in a variable called df1 to check the number of rows and columns. You could see that there’re 330 rows and 6 columns.

2.2 Remove unused columns

First, we checked the number of missing values and which columns the data were composed of. As a result, we could see that there were no missing values.

There are 6 columns in total, but we decided to organize them with the least number of columns to use.

columns1 = ['자치구명', '전통시장명', '주소명', '경도', '위도']df1 = df1[columns1].copy()
df1.shape

We don’t use the 자치구 code, so updated the df1 by designating only the columns except them, then we could see that the columns were reduced to five.

Also we could check that the column and memory have been reduced through info command.

2.3 Organize Columns

df1.columns = ['구', '시장명', '주소명', '경도', '위도']
df1.head()

The column names are organized into general forms to facilitate future data merging. Change ‘자치구명’ to ‘구’ and ‘전통시장명’ to ‘시장명’.

*자치구명 refers to , 구 refers to

Checked the head, we could see that the ‘시장명’ column included shops which do not belongs to traditional markets.

Therefore, we decided to extract and use only the data with ‘시장’ in the name.

2.4. Import the Traditional Market Data Only

df_market = df1["시장명"].str.contains("시장").copy()
df_market.value_counts()

Through df_market, we could see that there are 67 data whose name does not contain ‘시장’.

df_mk = df1.loc[df1["시장명"].str.contains("시장")].copy()
df_mk.head(10)

Then we put the data that excludes data that does not include ‘시장’ in the name into a variable called df_mk.

This is the head of the data df_mk that contains the finally organized traditional market latitude.

3. Preprocessing Datasets — The Convenience Facilities data

3.1 Load Datasets

df = pd.read_csv(‘data/전통시장현황.csv’, encoding=’CP949')

Load the ‘전통시장현황’ data from Public data portal and organize what are the convenience facilities of traditional markets in Seoul.

3.2 Remove unused columns

df.columns

With df.info() command, we could know that there are 76 columns in total. To provide meaningful information to users, we would like to select only columns with many double missing values. To do so, let’s find out which column has a lot of missing values.

df.isnull().sum().plot.barh(figsize=(10,25))

This gives you a visual overview of the missing columns, such as the graph below. Size was applied with size= (10,25) to make the value visible.

Among them, we’ll pick only the columns we’ll use and put them back in the data. When you put it back in the data, use copy().

columns = [‘시장명’,’시군구’,’시도’,’보유갯수 — 16시장전용 고객주차장’,‘시장/상점가의 주력 상품 여부(1=있음, 2=없음)’,’보유현황 — 10쇼핑카트(1=있음, 2=없음)’,‘시장/상점가의 주력 상품의 상품명’]df = df[columns].copy()

You can significantly reduce capacity by putting only the desired values back into the data.

Check it with df.info(). Pre-processing data memory usage: 861.1+ KB. However, after preprocessing, memory usage: 79.4+ KB has decreased significantly.

3.3 Import the Traditional Market Data in Seoul Only

The data now contains national market information. We need information from the traditional market in Seoul, so we will only print data that the city and provincial governments start with Seoul.

This data stores the results in a variable called df_seoul.

df_seoul = df[df[“시도”] == “서울”].copy()

Also, if you look at the market name, it includes data that is not a traditional market. We only need ‘traditional market’ data, so we only extract markets with ‘시장’ in them.

df_seoul[df_seoul[“시장명”].str.contains(“시장”)]

Thus, we could only extract ‘Seoul City’ traditional market status data from the national traditional market status data.

4. Combining Datasets

4.1 Merge df_mk and df_seoul by market name(‘시장명’)

Let’s bring in the df_mk that was pre-processed earlier.

We select and combine markets in both df_seoul and df_mk. Leave the combined data in the df. If the Gangnam market is included in both data, you now have data that shows the latitude and convenience information of the Gangnam market at once.

df = pd.merge(df_mk, df_seoul).copy()      
df.head()

Merge the two dataframes by the market name.

We’ll pick the market name to see if it’s well combined.

df_mk.loc[i, "시장명"]

If you check this, you can see that the market names were well selected. Now that we’ve got the df_final, let’s put the market location on the map as a marker, bring the cursor to the marker, and visualize the convenience information in the market.

5. Visualization

Finally, we used Python’s visualization package Folium to visualize the df.

5.1 Centering a map

long = df[“경도”].mean()
lat = df[“위도”].mean()

To specify the center when the map is printed, obtain the average of the latitude and longitude of the df.

3.2. Visualize with Marker

m = folium.Map([lat,long], zoom_start=12)for i in df.index:
sub_lat = df.loc[i, "위도"]
sub_long = df.loc[i, "경도"]

title = f'{df.loc[i, "시장명"]} ({df.loc[i, "주소명"]})'
info = f'{df.loc[i, "시장명"]} (주력 상품: {df.loc[i, "시장/상점가의 주력 상품의 상품명"]})'
popup = folium.Popup(f'<i>{info}</i>', max_width=600, max_height=600)

icon_color = "red"

folium.Marker([sub_lat, sub_long],
icon=folium.Icon(color=icon_color),
tooltip=title,
popup=popup).add_to(m)# for문 끝나고 지도 출력
m

To show all the market data of the df, all the indexes were printed out through repeated statements.

To display the location on the map, we took the latitude and longitude of each index and stored it in the variable market name and address name, title, and main product information is info, and put it in the Marker setting.

Also, the contents of the pop-up were displayed vertically in Korean characters when it was first run, so for readability, we added the max_width option to the pop-up so that information is displayed on a single horizontal line.

It displays icons according to the latitude of the traditional market on the map, and when the cursor goes up, it shows the market name and address name with a tooltip window. And when clicking, we visualized the pop-up window to show the market name and the main product.

We also expressed it with the Stamen Toner theme and Circle Marker. The difference with Marker is that you can specify the radius of the marker.

3.3. Visualize with MarkerCluster

When marked with Marker as shown above, there were problems where icons appeared too many times at once or overlapped, and there were disadvantages that they could not show the density of differentiation. Therefore, we decided to express the final form as MarkerCluster.

from folium.plugins import MarkerClusterm = folium.Map([lat,long], zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)for i in df.index:
sub_lat = df.loc[i, "위도"]
sub_long = df.loc[i, "경도"]
title = f'{df.loc[i, "시장명"]} ({df.loc[i, "주소명"]})'
info = f'{df.loc[i, "시장명"]} (주력 상품: {df.loc[i, "시장/상점가의 주력 상품의 상품명"]}, 고객주차장 개수 : {df.loc[i, "보유갯수 - 16시장전용 고객주차장"]}, 쇼핑카트 유무(1=있음, 2=없음) : {df.loc[i, "보유현황 - 10쇼핑카트(1=있음, 2=없음)"]})'
popup = folium.Popup(f'<i>{info}</i>', max_width=600, max_height=600)
icon_color = "red"

folium.Marker([sub_lat, sub_long],
icon=folium.Icon(color=icon_color),
popup=popup,
tooltip=title).add_to(marker_cluster)
m

First, you must import the MarkerCluster. The code is almost the same as Marker. The difference is that you don’t put the map directly into the marker, but through the marker_cluster.

The higher the market density in the area, the more green<yellow <orange is represented step by step.

When you zoom in on a map, the icons in the cluster count are slowly visible, and you can also click directly on the cluster to zoom in on the icons in that cluster at once.

The tooltip provides convenience information such as the market name and address, and the pop-up window that you can see at the click of the tooltip provides information on the market’s main products, the number of customer parking lots, and the presence of shopping carts.

6. Conclusion

It was so hard to find data that fits the purpose of this exercise. In order to get the data you want, it was essential to combine the information in the Seoul public data portal or to pre-process it to get only the information you wanted.

Also it was our first time doing a hands-on visualization of public data, and we had meaningful experience of directly changing the data that exists only in csv files. Next time, we would like to provide users with information on famous tourist traditional markets such as Jeju Island(제주도) and Gangneung(강릉), not limited to traditional markets in Seoul.

--

--