Capstone Project - The Battle of Neighborhoods: Istanbul Edition

Volkan KUMBASAR
7 min readDec 5, 2020

INTRODUCTION

Istanbul is Turkey’s most popular and crowded city. Since this city lies on both European and Asian sides, it has a pretty mixed culture that can be also sensed in its rich and various cuisines. Within its 39 districts, one can find a variety of many options for food, like from the far east sushi to the Turkish kebab, and from the Greek seafood to the western burger .

In this project, we aim to guide people to find suitable restaurants for their taste or requests. So, we are trying to find to answers to the following questions: “Which district offers what type of restaurants?”, “Where to eat sushi or seafood?”, “Where can we find the more preferred meat and spice?” or “Which districts offer the most various cuisines?”, and “which side of Istanbul does represent Istanbul better? Asian or European side?

DATA

Following data sources will be used to fulfill the target of this project:

  1. District List: Istanbul’s districts list can be obtained by using Wikipedia’s List of districts of Istanbul page. This list provides all 39 district names with other information like population, area, and density. For this project we will only use the district name list.
  2. Location Data: Using Geocoder API, we will obtain the location data of the districts. Also, we will split the districts into Asian or European side according to their location.
  3. Restaurant List: Using the Foursquare API, we will fetch the detailed information about the restaurants in Istanbul. We will limit the venue size to 1000 and we will search venues in a radius of 5 km for each district center.

By merging all these data sources in one data source, we will provide the distribution and density of the restaurant categories. After that we will try to classify the districts into clusters in order to see if we are getting some patterns.

METHODLOGY

First, we will pull and cleanse district list from the wikipedia page stated in the data section. Below table shows a subset of the cleansed district data set.

Sample cleansed district list

Now, let’s add location data for these districts by using the Geocoder API, and more importantly let’s classify the districts according to the continent where they are located, as Asian or European.

District data set with location data

The next figure visualizes Istanbul’s districts on a map. Red colored spots are located in the European side, whereas the blue ones show the Asian side. In total 14 districts are located on the Asian side while the other 25 districts are on the European side.

Istanbul map

With a 5 km radius from each 39 district centers, we will pull max 1000 venues using Foursquare’s API and we will filter out all the non-restaurant venues from this set.

According to the Foursquare’s API results, we are getting in total 3871 venues out of which 617 are listed as restaurant category . Also, we found out that we have 34 unique restaurant categories for these venues.

The next figure visualizes the top ten restaurant categories in Istanbul:

As, you can see, if we ignore the generic “Restaurant” label, most of the restaurant are listed as “Turkish restaurant” which is expected. The data also shows that Istanbul has a lot of seafood restaurants since Istanbul is surrounded by the Marmara and Black Seas.

Now, let’s take a bit deeper look at both sides of Istanbul. The next figures are displaying the top ten of the Asian and European sides, respectively.

Asian Side Top 10

Here we can observe that in the European side, seafood restaurants are more than in the Asian one. In addition to that, in European side, we can observe that the Mediterreanean restaurant category is in the top ten list. The reason behind this might be due to the fact that the European side is the city’s commerical center and is more touristy.

European side top 10

Now, in order to get more information, we will apply the k-means clustering analysis so as to do clustering in our district set. For this aim, we will use all 34 restaurant categories and try to identify three different clusters. Also, to simplify the cluster analysis, we only list the most common five categories of the restaurants for each district. Below figure shows an example which we will use to train the k-means algorithm.

Sample training set

RESULTS

The below table shows an example of the 3-means clustering training result. This table provides the cluster number (0,1 or 2), district, side and the five most common restaurant venues.

Clustered district set

The next map visualizes the three clusters. Each cluster is represented in a different colour.

Clusters

Let’s analyze these clusters in detail.

Cluster #0 — Red

The following table provides a sample of cluster #0. As it can be seen easily, since the first common venue is “Turkish restaurant”, this cluster represents the “Turkish restaurant” cluster.

Cluster #0— Turkish food

Cluster #1— Purple

This cluster, represents the districts where the “Seafood” restaurants are the first most common venue. Also, we can see in above map that these districts are located near to the sea which makes sense.

Cluster #1— Seafood

Cluster #2— Green

Although “Restaurant” here is the most common venue, this cluster seems to be a generic one. Here we have some “Vegan”, “Sushi” and “Kebab” categories in the second most common venue. So, we can claim that it represents a mixed or other cluster.

Cluster #2 — Mixed or other

Additionally if we repeat the 3-means analysis for the European side, we would be getting the cluster map below:

European Clusters

Here, the cluster pattern is similar with the overall Istanbul. The red cluster represents the Turkish restaurants whereas the green cluster represents the Seafood restaurant cluster. The mixed cluster one is purple cluster. The following tables show these clusters.

Cluster #0— European — Turkish food
Cluster #1— European — Mixed food
Cluster #2— European — Seafood food

Unfortunately, for Asian side we couldn’t find any of these cluster pattern. See used jupyter notebook.

DISCUSSION

We tried to analyze the restaurant categories in clusters by using k-mean algorithm. We have obtained data from Foursquare’s API and performed data analysis through the API output information also by leveraging on the coordinates of the districts. In further studies, we can also use different services (e.g. Google) and merge them into one single data source.

The analysis provided us an overview of restaurants scattered across the districts in Istanbul, as well as the categories and finally the top clustering of restaurants.

Since Istanbul is a mega city, following limitations potentially might have impacted the accuracy of our analysis:

  • We have taken maximum 1000 venues within a radius of 5 km from each of the district centers.
  • Venues data set is limited to those available in Foursquare.
  • A significant number of restaurants in Foursquare are labeled as “Restaurant” instead of a specific restaurant category.

CONCLUSION

I believe, we can say that we were able to find some answers for the questions aforementioned in the Introduction section. Therefore we found out that we can divide Istanbul’s districts into three different categories: seafood, Turkish and mixed/others restaurants. We also found out that the European side of Istanbul shows the same clustering pattern as the Istanbul. Plus, districts near to the sea, have the most seafood restaurants, as expected.

By the way, you can find the jupyter notebook here.

--

--