Vaex: Lazy Out-of-Core DataFrames for Python.

Visualize and explore big tabular datasets. A billion rows per second on a single computer.

Install Docs Demo movie

Generic placeholder image

Why use vaex

Visualize and explore huge tabular datasets interactively...

Read more »

Generic placeholder image

How does it work

vaex does this by visualizing binned aggregated data...

Read more »

Generic placeholder image

What is vaex

A graphical interface, or library that integrates with the Jupyter/IPython notebook...

Read more »


Why use vaex?

  • Visualize and explore big tabular data interactively
  • Process more than a billion objects per second on a single computer.
  • Transform the data (lazily) on the fly using regular numpy, without using memory.
  • Filter the dataset by using visual queries and boolean expressions to visualize subsets of the or to do data cleansing.
  • Vaex has a graphical interface for most common uses cases.
  • Vaex integrates well in the Jupyter/IPython notebook/lab ecosystem.
  • Client/server architecture: Delegate computations to a remote server. (in development)
  • Use a cluster to visualize and explore even larger datasets (10-100 billion). (in development)
  • With a focus on astronomy and astrophysics, but widely applicable.
  • Can visualize the whole Gaia catalogue in one second.


How does it work?

Vaex does this by:

  • Binning or aggregating the data on a grid, using simple optimized algorithms
  • Virtual columns behave like regular columns, but are only computed in chunks when needed not to waste memory.
  • Columnar storage of data avoids reading unneeded data and enables maximum performance of hard drives.
  • Memory mapped files avoids unneeded reading, and copying of data. Open a terrabytes file in milliseconds.


What is vaex?

  • A Python library/package for (data) scientists:
    • Is pip and conda installable.
    • Make custom plot and statistics.
    • Calculate statistics on a N-dimensional grid and visualize it.
    • Create interactive Jupyter/IPython notebooks.
    • Publication quality plots with matplotlib.
    • Interactive plots with bqplot or bokeh.
    • Combine the notebook with the graphical interface in one kernel
  • Has a standalone program/gui that
    • Requires no programming knowledge
    • Visualizes 1d histograms, 2d density plots, averages quantities, and 3d volume rendering
    • Allows interactive navigation and selection
    • Overlay vector and tensor quantities in 2 and 3d.
Generic placeholder image

Installation

Desktop user? Download the standalone OSX or Linux version. *

For programming? Install the python package:
$ pip install --user --pre vaex

Or for anaconda users:
$ conda install -c conda-forge vaex

Latest from git:
$ pip install git+https://github.com/maartenbreddels/vaex/

Or see more detailed instructions.

*Not possible to combine with the IPython/Jupyter notebook.


Live demo. Yellow taxi pickup locations in New York City.

The demo on the right shows 140 million points, rendered real time. Zoom/pan and the plot get updated on the fly.



Demo movies.

Fast visualization

Coming soon

Selections and linked views

Coming soon

Notebook integration

Coming soon

Gaia data.

Data:

Interactive demo showing 100 million points (10%) of the Gaia DR1 data, rendered real time. Zoom/pan and the plot gets updated on the fly.


Acknowledgements.

Vaex is funded by:

.


Requests/Issues/Contact.

Vaex is open source, the source code and issues live on Github. Please use github to report issues. Contributions are welcome using Pull Requests.

.