Jordi Casanellas
  • Data Science Blog
  • Dataguda
  • Astrophysics
    • Research
    • Teaching
    • Videos
    • Press
  • Contact
  • Data Science Blog
  • Dataguda
  • Astrophysics
    • Research
    • Teaching
    • Videos
    • Press
  • Contact
Jordi Casanellas

INTRO TO Machine Learning with Spark

24/9/2015

8 Comments

 
Do you know that you can apply machine learning algorithms to big data very easily? What makes it simple is Spark and its machine learning library MLlib. And it gets even simpler using the python API PySpark.

To better visualize how to do that, please take a look at this notebook:  
              Spark_MLlib_Classification

Read More
8 Comments

Tracking the mood of the world

14/9/2015

0 Comments

 
In this small project, a very rough estimation of the mood of the world is performed through the daily analysis of the headlines of several news websites. In total, 36 websites are scanned, which are divided in 8 arbitrary regions: Africa, Asia, Australia, Europe, Latin America, Middle East, Russia and USA. A very basic sentiment analysis is done by comparing the extracted text with a  sentiment lexicon   of rated words. 

That's how looked the evolution of the mood in different regions of the world from August to October 2014:

Read More
0 Comments

A simple Neural Network in Python  

14/9/2015

0 Comments

 
Have you ever wondered how neural networks work? The best you can do to better understand how they work is to program one yourself. Mine is in Python, and is inspired by the Machine Learning course in Coursera by Andrew Ng.
The architecture of the network is flexible (number of layers, input and output units). The neural network classifies using regularized logistic regression. The gradients are computed with backpropagation and are checked numerically. The network is optimized with the   SciPy  nonlinear conjugate gradient algorithm. When several  regularization parameters are used, the optimization is parallelized. Finally, learning curves are computed to evaluate the performance of the neural network.
Picture
Beautiful picture to encourage you to go deep into the algorithms behind neural networks.

Read More
0 Comments

Plotting on top of Google Maps with Bokeh

14/9/2015

6 Comments

 
With the Python library Bokeh it's easy to plot your data interactively on top of Google Maps. Here I am plotting the area and population of the different boroughs of Berlin:
And here I plot the density of the boroughs of Berlin on top of the satellite image:

Read More
6 Comments

    Jordi

    Data Scientist.
    Here you'll find some examples of data analysis, visualizations, machine learning and related topics.

    Archives

    July 2016
    October 2015
    September 2015

    Categories

    All
    Bokeh
    Data Visualization
    Machine Learning
    Python
    R
    SQL

    RSS Feed

Picture