Jordi Casanellas
  • Data Science Blog
  • Dataguda
  • Astrophysics
    • Research
    • Teaching
    • Videos
    • Press
  • Contact
  • Data Science Blog
  • Dataguda
  • Astrophysics
    • Research
    • Teaching
    • Videos
    • Press
  • Contact
Jordi Casanellas

Missing data: to Impute or not to impute? + R examples

11/7/2016

9 Comments

 
Picture
Very often the data we want to analyse and make 
predictions with is full of black holes of missing data. What to do with that? Would you remove the entries (rows) with missing data? Would you remove the variables (predictors, columns) with missing values? Would you try to impute the missing values (to "guess" them)?

The strategy to follow depends on your (missing) data. Your data can have missing values which can be distributed at random, or not...


Read More
9 Comments

INTRO TO Machine Learning with Spark

24/9/2015

8 Comments

 
Do you know that you can apply machine learning algorithms to big data very easily? What makes it simple is Spark and its machine learning library MLlib. And it gets even simpler using the python API PySpark.

To better visualize how to do that, please take a look at this notebook:  
              Spark_MLlib_Classification

Read More
8 Comments

A simple Neural Network in Python  

14/9/2015

0 Comments

 
Have you ever wondered how neural networks work? The best you can do to better understand how they work is to program one yourself. Mine is in Python, and is inspired by the Machine Learning course in Coursera by Andrew Ng.
The architecture of the network is flexible (number of layers, input and output units). The neural network classifies using regularized logistic regression. The gradients are computed with backpropagation and are checked numerically. The network is optimized with the   SciPy  nonlinear conjugate gradient algorithm. When several  regularization parameters are used, the optimization is parallelized. Finally, learning curves are computed to evaluate the performance of the neural network.
Picture
Beautiful picture to encourage you to go deep into the algorithms behind neural networks.

Read More
0 Comments

    Jordi

    Data Scientist.
    Here you'll find some examples of data analysis, visualizations, machine learning and related topics.

    Archives

    July 2016
    October 2015
    September 2015

    Categories

    All
    Bokeh
    Data Visualization
    Machine Learning
    Python
    R
    SQL

    RSS Feed

Picture