With the massive amounts of data being created on the web every day, the internet has transformed into a huge repository of endless information. This vast amount of data also brings with it some interesting possibilities that were not thought of previously. Having the ability to look up and clear your doubts quickly is only a minor benefit of the web. With the newer technologies that can help in web data mining, it is possible to aggregate zettabytes of data from the web in order to reveal actionable insights.
Data science and artificial intelligence can together do wonders with the big data that grows exponentially on the web. This is exactly what happened when Sina Sajadmanesh and his colleagues at the Sharif University of Technology aggregated huge data sets of recipes from around the web, classified them by the cuisine and ran different analytics on top of it to identify the correlation between cuisines and health measures in different parts of the world.
We are what we eat
A lot of research has gone into identifying the relationship between health and food. 19th century gastronomist Jean Brillat Savarin famously said “Tell me what you eat and I will tell you what you are.” This statement- that you are what you eat – was received well by people from all around the world and has been rephrased to be titles of countless health guides and cookbooks. It is even a way of life for many. If the quote by Brillat Savarin is true, it must have serious implications for health issues like obesity. The correlation between the cuisines all over the world and their effects on health is still a dilemma. This is exactly what Sina and his pals tried to shed some light into through this data mining study.
How they did it
Sina and his team started out by extracting data sets from the recipe recommendation app called yummly and several other sources on the web. They fetched data that included 15,000 recipes belonging to 200 different cuisines. These recipes had about 3000 ingredients. The recipes were then categorised on the basis of the cuisines they belonged to. They went on to determine the nutritional values of these recipes such as the amount of protein, fat and carbohydrate that each of them contained.
Now, to compare this with the health measures in parts of the world where these cuisines are from, they aggregated country-level statistics like the percentage of GDP, obesity problems, immigration rates and the expenditure on healthcare. Once all the essential data points were collected, they used different machine learning and analytics techniques on top of it to derive insights.
Their work has indeed generated some interesting insights on how different cuisines are related by their common ingredients, what defines a cuisine and more importantly, how different cuisines influence our health.
Diversity in ingredients
Diversity of ingredients was one thing Sina and his team were looking for in the cuisines. They calculated the number of different ingredients in every cuisine and how they varied between recipes inside a cuisine to get a deeper understanding about the diversity. They found that the countries where immigrants are more in number- such as USA, Australia etc. had the greatest diversity when it comes to dishes and ingredients. This could be because of the immigrants bringing their home culture along with them, which makes their target country rich in cuisines.
The complexity of dishes were also measured using the number of ingredients being used in them. More than half the recipes from Laos were found to have 15 or more ingredients. In the case of Russia, it was less than seven. You could say that the Laos cuisine is comparatively complex than the one in Russia.
Sajadmanesh says that the countries with higher number of ingredients tend to have dishes that are more complex. Indian and Chinese cuisines are both exceptions though. Although both the countries have significantly less number of ingredients to choose from, the complexity of dishes are high. They believe that this could be because of these countries having good chefs that could come up with complex dishes from limited ingredients or having older cultures that had enough time to evolve.
A heat map showing the diversity of ingredients in cuisines around the world. The darkness of red indicates a higher diversity.
They also looked at the relationship between various cuisines with respect to the ingredients being used by them. This revealed that certain ingredients defined cuisines. For example, ground spice garam masala is only found in the Indian cuisine while Mozarella cheese is a defining ingredient of the Italian cuisine.
Cuisines and health
The team finally looked at the interrelationship between different cuisines, their nutritional qualities and the health aspect of the population that consumes it. This showed that there is a definite correlation between the cuisines that were rich in carbohydrate and sugar and health problems like obesity. Also, health issues were significantly lower in the population that followed a protein-rich cuisine.
It is interesting how web data mining could be used to shed light into a complicated matter like the correlation between cuisines and health. This kind of insights have the power to help us identify issues, find solutions and improve the present scenario. It’s safe to assume that web crawling technologies can fuel more deep researches on this and many other countless areas where we currently lack clarity.