A new approach to DataFrames and pipelines
We show how to deal with massive datasets using small resources using the Python Vaex DataFrame library. Using computational graphs, efficient algorithms and storage (Apache Arrow / HDF5) Vaex can easily handle up to a billion rows, even on your laptop. As a bonus, Vaex can automatically generate a Machine Learning pipeline using the graph structure build-up internally in the DataFrame.