We are where we eat

We are shaped by what we eat. But we are also shaped by where we eat. Are chefs ready to trade high-quality, fresh ingredients for cheaper stale ones? Are there places and times where food conservation has a subjective meaning? Are large-scale franchises such as McDonald’s and Starbucks under the same level of scrutiny as normal restaurants despite the economic interests at play?

We will do a deep dive into Chicago’s food inspection scene and everything happening behind the curtain.

To achieve this, we will pore over Chicago's Food Inspections dataset, and answer five big questions: what is the impact, on a food-related business, of... its business type? Its neighborhood? Its size or scale? Of the time of the year? And of its social media reviews?

What is the impact of the business type?

The district of Chicago is full of different restaurants and catering businesses. These range from restaurants to bars to grocery stores to school cafeterias to hospitals to many others. However, these are not present at the same frequency.

At a first look, it's impossible to be certain on whether the difference in facilities inspections reflects some bias or simply the food business landscape in Chicago. The donut chart bellow reflects the number of inspections per business type, for the 9 most common types (and a tenth type, other types representing the remaining 34 categories).

To ascertain whether these inspections favor a particular type of business, we can examine the ratios of their inspection results. An inspection results in one of seven possible outcomes: Pass, Pass with conditions, Fail, Not ready, No entry, Out of business, Business not located. You can explore these ratios yourself in the plot below.

The plot has 3 bars, the top one with the global average accross all entries in the data, the second the average across one category, and the third one the average results for any one of the 43 specific business types.

As we can observe, the probability of passing of failing an inspection is affected by the business type. However, this may simply reflect underlying factors. A mobile food vendor, for example, is likely to have fewer financial resources than a daycare or school owner (the main entities in the infant care category). As such, the later will definitely be able to invest more into ensuring proper kitchen and serving conditions.

Finally, overall, it appears that the inspections do indeed reflect the food business landscape in Chicago, as there aren't strong correlations with top or bottom facilities (passing- or failing-wise) and facilities with the least or most visits. As such, the results and inspections frequencies seem to be well balanced!

What is the impact of the business reviews?

Can the reviews posted on online services allow us to discover whether a business has failed or passed the inspections in Chicago? In this section, we are going to analyse whether there is a relationship between the review rates on Google Local (Google Maps) and their corresponding inspections.

We have merged the Health Inspection dataset with Google’s Local dataset, extracted from the Recommender Systems Dataset by Julian McAuley, professor at the University of California, San Diego. After filtering the data, obtaining only the reviews businesses in Chicago and merging them with our dataset, we obtain around 1600 unique businesses for which we have reviews.

We then try to find relationships between the businesses average review rate and their probability of passing an inspection (calculated from the Health Inspection Dataset), as well as the number of reviews.

To analyse the possible relationships, we study several correlations. All of them yield irrelevant results between the variables considered. The following table shows the highest correlation (Spearman) obtained:

In addition, we also inspect the variables visually. As expected, the variables do not seem to have a significant correlation. The following plot proves this point.

We conclude that no connection can be found between Google Local reviews and the posibility of passing or failing an inspection. Both visual inspection and numeric values have failed to give us any insight about this relationship.

Additionally, we can say that the number of ratings of a business on Google does not seem to be correlated with its number of failed/passed inspections either. Does this mean that users do not really care about the health conditions of a business when reviewing it? Or does this merely indicate that they do not have the chance to go behind the scene and to discover what the true sanitary conditions of the business is.

In this analysis, we do need to highlight that the number of overlapping businesses between the Google dataset and the Health Inspection dataset is small (~1600), with a heavy-tailed distribution on the number of reviews. Therefore, we deduce that a dataset of reviews with more up-to-date comments and ratings allied with a larger number of businesses in both datasets could yield different results.

What is the impact of the neighborhood?

As a multicultural city, Chicago is well known for the diversity of its neighbourhood, and the harmony reigning among all its citizens. It is thus only natural that we investigate the relationship between the dominating ethnicity of each neighbourhood and the food establishments disseminated all throughout Chicago.

But are they uniformely disseminated over the whole city?
On the map below, the radius of the different bubbles is directly proportional to the amount of restaurants in that zipcode region, normalised by the areas of each zip. Most restaurants are concentrated in the city centre, near the coast. Coincidentally, this also corresponds to the neighbourhoods where the population is primarily white. Because we broke down the city into zip code area, which are fairly large, the three predominant ethnicities were the following: "White Alone Population", "Black and African American Population" and "Hispanic or Latino Population" .

Inspections and Restaurants - Fair Distribution or Bias?

It is not surprising to find a majority of the restaurants located in white neighbourhoods, as most of Chicago is dominated by a mostly white population. Out of the 67 parts of Chicago, 37 are primarily white, 19 are mostly black and african american, and 11 are principally latino or hispanic. The amount of inspections each establishments benefits from is also fairly even accross the neighbourhoods dominated by different ethnicities.

However, if we look into the google reviews and into the number of food establishments that have not been inspected against the total amount of restaurants per neighbourhoods, we can observe that this ratio is larger in white neighbourhoods than in coloured neighbourhoods. This can mean one of two things: that restaurants located in white neighbourhoods that have not been inspected are most likely to be found on google reviews than restaurants located in other areas; or that restaurants in Black and African American or Hispanic or Latino neighbourhoods are most likely to be inspected.

Nonetheless, even if it was the case that you have a larger chance of being controlled when not based in a predominantly white neighbourhood, your odds of success are not impacted. Indeed, the chances of passing, failing or going out of business do not change much with the predominant ethnicity of your vicinity.

Settling in Coloured Neighbourhoods

From the data we have collected so far, it seems like moving to a neighbourhood that is not primarily white will either increase your chances of being inspected or lower your chances of being found on google reviews.
So why should you settle into those areas?
Easy answer!
The area one must explore before making sure one can find a restaurant varies among the different neighbourhoods. Statistically, with similarly interesting business, one based in a sparse neighbourhood in terms of food establishments will attract more clients than one located in a foodie area.

This might not be true in the case where a sparse area in terms of restaurants is also sparse in terms of inhabitants. However, this is not the case in Chicago. Food businesses located in Black and African American neighbourhoods might have on average more clients than if they were based in another neighbourhood.

Violations per Neighbourhood

For mere curiosity, or deeper insight, the following plots displays the distribution of violations for each of the ethnicity present in our dataset.

What is the impact of the business size?

When deciding which restaurant to go to, a good point to make the decision can be whether we know it or not, especially if we are abroad. And brands play a big role here. How different is eating in the multinational kind of restaurants from eating in independent restaurants? Let’s dig in.

First of all, we need to know which are the top chain businesses in our dataset of inspections in Chicago, by number of establishments. The following table shows the number of different establishments of the top chains in our dataset of Chicago. If the inspection are sampling businesses uniformly from all over Chicago, this can give us a good picture of Chicago’s chain business scene.

From the table we can derive that Subway is the company with biggest number establishments in Chicago, followed by 7eleven and Starbucks.

McDonald's vs Burger King

Let’s take on the battlefield of hamburgers. The 2 best-known chains in this industry are: McDonald’s and Burger King. Everyone has their preference when coming to them. But what happens when we analyze them from the point of view of their cleanliness conditions? A lot of times it is claimed that these businesses are not the best with regard to keeping their establishments correctly sanitized. Will there be a different between the two?

The plot above shows the rate of Pass/Pass with Conditions/Fail inspections for all the restaurants in Chicago that belong to these 2 chains. In the plot we observe the level of cleanliness for McDonald’s and Burger is very similar. There is no siginificant difference between them. They both have the same percentage of failed inspections, while McDonald’s has a 7% more of passed inspections, exactly the increased percentage of “Pass with Conditions” inspections that Burger King has over McDonald’s. In general terms we can say they are vey similar.

Thus, we can say that in Chicago there is not a big difference between going to McDonald’s or Burger King when considering their level of cleanliness.

Top Chains Inspections rates

We can do a similar analysis of the cleaning conditions of our previously computed top-10 chains in Chicago which are also familiar to many of us around the world. Here, we will check the percentage of “Passed” and “Failed” inspections for the top 10 chains in Chicago.

In the plot above, we show the percentage of failed and passed inspections for the top chains in Chicago. The plot is sorted by number of failed inspections. We observe that “Citgo” and “Family Dollar”, gas station shops and a variety store chain respectively have the largest amount of of failed inspections. In fact, the number “Fail” + “Pass w/ Conditiions” is alarmingly close to 50% of their inspected stores.

Independet businesses vs Chains

When we are in a new city, we need to make the decision whether to go into the unexplored wildenesses of eating in a new restaurant that we do not know or to visit a chain restaurant that is familiar to us. One could wonder if there is a big difference of inspections passing rate between the independent businesses and big chains.

Let’s look at the difference passing inspection rates between independent businesses and chain businesses in Chicago. For this analysis, we need to establish what chain business is. The definition is unclear. Can we consider a businesses a chain when they have 2 estabishments? 3? We decided to establish this threshold at 5 in order to consider a business a brand. Then we compare the Pass/Fail inspection rates between the group of independent businesses and chains.

There is a slight difference in the number of “Failed” inspections, where independent businesses failed 4% more than inspections than chain businesses. However, the there does not seem to be a big difference between the two types. Therefore, we can not conclude that it is significantly safer to go to a chain business than to an independent one.

Hurrah! We do not have to care about the cleaning conditions of independent businesses when we go to Chicago. Independent businesses and big chains have very similar conditions.

What is the impact of the time of the year?

People’s lifestyles, needs and wants change along the year, so we will investigate whether catering business performances are also subject to change as the seasons go by.

As we can see, the temperature and the time of the year do not signficantly affect the percentage of failed inspections.

However, a further look into the data shows us that multiple violations, especially those with codes 2 and 3 have a distribution that is highly skewed - with signficantly more occurences for high temperature, this presents a potential risk as both those violations are critical violations. Both codes are related to proper food storage, and looking at some examples of the violations, we can find notes that referigator being broken down is repeatedly reported. This is a normal problem where high temperatures causes sometimes the breaking down of referigerators if proper regular maintenance is not done. An example is present in the histogram bellow.

As such, one can conclude, since the overall level of fails remains the same throughout the year, that every season and temperature comes with its own share of heightned problems! So you can continue eating out during the summer!