Fast visualization of big data.
Plot 1 billion points in ~1 second, with interactive navigation on a single computer.
Lasso selection, with instant redraw.
Jupyter notebook integration.
Interactive navigation and selections are possible
Vaex: Lazy Out-of-Core DataFrames for Python.
Visualize and explore big tabular datasets. A billion rows per second on a single computer.
Why use vaex
Visualize and explore huge tabular datasets interactively...
Read more »
How does it work
vaex does this by visualizing binned aggregated data...
Read more »
What is vaex
A graphical interface, or library that integrates with the Jupyter/IPython notebook...
Read more »
Why use vaex?
- Visualize and explore big tabular data interactively
- Process more than a billion objects per second on a single computer.
- Transform the data (lazily) on the fly using regular numpy, without using memory.
- Filter the dataset by using visual queries and boolean expressions to visualize subsets of the or to do data cleansing.
- Vaex has a graphical interface for most common uses cases.
- Vaex integrates well in the Jupyter/IPython notebook/lab ecosystem.
- Client/server architecture: Delegate computations to a remote server. (in development)
- Use a cluster to visualize and explore even larger datasets (10-100 billion). (in development)
- With a focus on astronomy and astrophysics, but widely applicable.
- Can visualize the whole Gaia catalogue in one second.
How does it work?
Vaex does this by:
- Binning or aggregating the data on a grid, using simple optimized algorithms
- Virtual columns behave like regular columns, but are only computed in chunks when needed not to waste memory.
- Columnar storage of data avoids reading unneeded data and enables maximum performance of hard drives.
- Memory mapped files avoids unneeded reading, and copying of data. Open a terrabytes file in milliseconds.
What is vaex?
- A Python library/package for (data) scientists:
- Is pip and conda installable.
- Make custom plot and statistics.
- Calculate statistics on a N-dimensional grid and visualize it.
- Create interactive Jupyter/IPython notebooks.
- Publication quality plots with matplotlib.
- Interactive plots with bqplot or bokeh.
- Combine the notebook with the graphical interface in one kernel
- Has a standalone program/gui that
- Requires no programming knowledge
- Visualizes 1d histograms, 2d density plots, averages quantities, and 3d volume rendering
- Allows interactive navigation and selection
- Overlay vector and tensor quantities in 2 and 3d.
Desktop user? Download the standalone OSX or
Linux version. *
For programming? Install the python package:
$ pip install --user --pre vaex
Or for anaconda users:
$ conda install -c conda-forge vaex
Latest from git:
$ pip install git+https://github.com/maartenbreddels/vaex/
Or see more detailed instructions.
*Not possible to combine with the IPython/Jupyter notebook
Live demo. Yellow taxi pickup locations in New York City.
The demo on the right shows 140 million points, rendered real time. Zoom/pan and the plot get updated on the fly.
, and paste:
See next example, with a larger dataset
import vaex as vx
ds = vx.datasets.helmi_de_zeeuw.fetch() # may take a bit
ds.plot(ds.Lz, ds.E, f="log1p", show=True)
From the IPython/Jupter notebook, run
import vaex as vx
ds = vx.datasets.nyctaxi_yellow_201x.fetch() # may take a bit
ds.plot_widget(ds.pickup_longitude, ds.pickup_latitude, f="log1p")
The plot is interactive, meaning you can zoom in and out and the plot will be updated.
You will need about, ~15BG or free memory for a proper performance, or replace
for a subset.
Single hdf5 file, copy of full Gaia DR1 catalogue in random row order: direct download (351G).
random 10% of the catalogue, useful for on your laptop: direct link (35G).
All rows, less columns (ra, dec, l, b,ra,dec, g magnitude, etc): direct download (43G).
Single hdf5 file, copy of full TGAS catalogue: tgas-hdf5 (0.6G).
Interactive demo showing 100 million points (10%) of the Gaia DR1 data, rendered real time. Zoom/pan and the plot gets updated on the fly.
Vaex is funded by: