Last weekend I participated in the Data Dive organized by Data Science for Social Good Berlin . It was an amazing experience full of researchers, data scientists, and other folks willing to develop solutions, visualizations and insights from the data of the participating organizations: Jambo Bukoba, Street Football World and DataLook.
Do you know that you can apply machine learning algorithms to big data very easily? What makes it simple is Spark and its machine learning library MLlib. And it gets even simpler using the python API PySpark.
To better visualize how to do that, please take a look at this notebook:
In this small project, a very rough estimation of the mood of the world is performed through the daily analysis of the headlines of several news websites. In total, 36 websites are scanned, which are divided in 8 arbitrary regions: Africa, Asia, Australia, Europe, Latin America, Middle East, Russia and USA. A very basic sentiment analysis is done by comparing the extracted text with a sentiment lexicon of rated words.
That's how looked the evolution of the mood in different regions of the world from August to October 2014:
Have you ever wondered how neural networks work? The best you can do to better understand how they work is to program one yourself. Mine is in Python, and is inspired by the Machine Learning course in Coursera by Andrew Ng.
With the Python library Bokeh it's easy to plot your data interactively on top of Google Maps. Here I am plotting the area and population of the different boroughs of Berlin:
And here I plot the density of the boroughs of Berlin on top of the satellite image: