Duration: 32 mins
May 1, 2017

A billion stars in the Jupyter Notebook

This talk will show what is possible huge datasets that are becoming more prevalent in the era of big data. I will demonstrate this and the 3d visualization in the Jupyter notebook, the by now almost standard environment of (data) scientists.

With large astronomical catalogues containing more than a billion stars becoming common, we are preparing for methods to visualize and explore these large datasets. Data volumes of this size requires different visualization techniques, since scatter plots become too slow and meaningless due to overplotting. We solve the performance and visualization issue using binned statistics, e.g. histograms, density maps, and volume rendering in 3d. The calculation of statistics on N-dimensional grids is handled by Python library called Vaex, which I will introduce. It can process at least a billion samples per second, to produce for instance the mean of a quantity on a regular grid. This statistics can be calculated for any mathematical expression on the data (numpy style) and can be on the full dataset or subsets, specified by queries/selections.

Presented by:

    • Maarten Breddels
      Founder of vaex.io