Advanced plotting examples¶

If you want to try out this notebook with a live Python kernel, use mybinder:

Vaex uses matplotlib for creating plots, which allows for great flexibility. To avoid repetative “boilerplate” code, Vaex tries to cover many use-cases where you want to plot one or more panels using a simple declarative style.

The following examples will make use of the example dataset, which is a the results of a numerical simulation of how a galaxy like our own Milky Way was formed (source). The data contains the 3D position, velocity, angular momentum, energy and iron content for each start particle in the simulation.

Let us start by loading the data:

[1]:

import vaex
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

[2]:

df = vaex.example()
df.head()

[2]:

#	id	x	y	z	vx	vy	vz	E	L	Lz	FeH
0	0	1.23187	-0.396929	-0.598058	301.155	174.059	27.4275	-149431	407.389	333.956	-1.00539
1	23	-0.163701	3.65422	-0.254906	-195	170.472	142.53	-124248	890.241	684.668	-1.70867
2	32	-2.12026	3.32605	1.70784	-48.6342	171.647	-2.07944	-138501	372.241	-202.176	-1.83361
3	8	4.71559	4.58525	2.25154	-232.421	-294.851	62.8586	-60037	1297.63	-324.688	-1.47869
4	16	7.21719	11.9947	-1.06456	-1.68917	181.329	-11.3336	-83206.8	1332.8	1328.95	-1.85705
5	16	-7.78437	5.98977	-0.682695	86.7009	-238.778	-2.31309	-86497.6	1353.25	1339.42	-1.91944
6	12	8.08373	-3.27348	5.54687	-57.4544	120.117	5.37438	-101867	1100.8	782.915	-1.93517
7	26	-3.55719	5.41363	0.0917156	-67.0511	-145.933	39.6374	-127682	921.008	882.101	-1.79423
8	25	3.9848	5.40691	2.57724	-38.7449	-152.407	-92.9073	-113632	493.316	-397.824	-1.18076
9	8	-20.8139	-3.29468	13.4866	99.4067	28.6749	-115.079	-55825.3	1088.46	-269.324	-1.28892

A single plot¶

The simplest case is a single heatmap created by two axes, specified by the first two arguments:

[3]:

df.viz.heatmap('x', 'y', title='Face on galaxy', limits='99%')

../_images/guides_advanced_plotting_5_0.png

Multiple plots of the same type¶

The first argument can be a list of axes pairs. This produces multiple plots:

[4]:

df.viz.heatmap([["x", "y"], ["x", "z"]], title="Face on and edge on", figsize=(10, 4), limits='99%');

../_images/guides_advanced_plotting_7_0.png

Multiple plots, same axes, different statistics¶

If the what argument is a list, it will by default create multiple subplots:

[5]:

df.viz.heatmap("x", "y", what=["count(*)", "mean(vx)", "correlation(vy,vz)"],
               title="Different statistics",
               figsize=(10, 5), limits='99%');

../_images/guides_advanced_plotting_9_0.png

Multiple plots, different axes, different statistics¶

One can specify multiple axes pairs as tje first argument, as well as a list of what arguments. The resulting figure with have a number of subplots where the different axes combinations will form the rows, and the different what statistics will form the columns:

[6]:

df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
               what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
               title="Different statistics and plots",
               figsize=(14,12),
               limits='99%');

../_images/guides_advanced_plotting_11_0.png

One can also specify the layout of the figure via the visual argument, which can be used to swap the row and column ordering of the subplots:

[7]:

df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
               what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
               visual=dict(row="what", column="subspace"),
               title="Different statistics and plots",
               figsize=(14,12),
               limits='99%');

../_images/guides_advanced_plotting_13_0.png

Slices in a 3rd dimension¶

If a 3rd axis (z) is given, you can “slice” through the data, displaying the z slices as rows. Note that here the rows are wrapped, which can be changed with the wrap_columns argument:

[8]:

df.viz.heatmap("Lz", "E", z="FeH:-3,-1,8",
               visual=dict(row="z"),
               figsize=(12, 8),
               f="log",
               wrap_columns=3,
               limits='99%');

../_images/guides_advanced_plotting_15_0.png

Many plots with wrapping¶

If one attempt to create a figure with many subplots, they will be nice wrapped. Where we create heatmaps of all combinations of columns in the example dataset, sorted by their mutual information:

[9]:

# Get all column pars
pairs = df.combinations(exclude=['id'])
# Calculate the mutual information for each pair, sorted by the largest value
mi, pairs_sorted = df.mutual_information(pairs, sort=True)

# Create the figure
df.viz.heatmap(pairs_sorted, f='log', colorbar=False, figsize=(14, 20), limits='99%', wrap_columns=5);

../_images/guides_advanced_plotting_17_0.png

Plotting selections¶

If the selection argument is used, than only the selection is plotted:

[10]:

df.viz.heatmap("x", "y", selection="sqrt(x**2+y**2) < 5", limits=[-10, 10]);

../_images/guides_advanced_plotting_19_0.png

If a list of selections is specified (False or None indicates no selection), than every selection by default forms a different “layer” of the figure produced:

[11]:

df.viz.heatmap("x", "y",
               selection=[None, "sqrt(x**2+y**2) < 5", "(sqrt(x**2+y**2) < 7) & (x < 0)"],
               limits=[-10, 10]);

../_images/guides_advanced_plotting_21_0.png

Overplotting a vector field on a heatmap¶

Astronomers argue that galaxies such as our own Milky Way were formed from many pre-galactic clumps that have merged and mixed together. One way to try and find the original pre-galactic fragments is to inspect the 2-dimensinoal distribution of their energy (𝐸) and angular momentum (𝐿𝑧). So let us make such a plot:

[12]:

df.viz.heatmap('Lz', 'E', f='log', figsize=(9, 6));

../_images/guides_advanced_plotting_23_0.png

Now, to show that the stars in each clump on the figure above are indeed moving coherently in space, we can overplot their velocity vectors on a positional heatmap.

First, let’s select the stars that belong to one the clusters:

[13]:

# specify ranges of angular momentum (Lz) and energy (E)
limits_Lz_E_clump = (1181.770, 1291.92), (-70850.91, -68491.16)

# Use the rectangle selection method
df.select_rectangle("Lz", "E", limits_Lz_E_clump, name="stream")

# Check how many stars we have selected
print(f'Selection contains {df.count(selection="stream")} "stars".')

Selection contains 9556 "stars".

We can also overplot the selected region, to convince ourselves that we have chosen a good region:

[14]:

df.viz.heatmap("Lz", "E", selection=[None, "stream"], f="log", figsize=(9, 6));

../_images/guides_advanced_plotting_27_0.png

Now let us plot the 𝑣𝑦 and 𝑣𝑧 velocity vectors on top of 𝑦−𝑧 plot. To start, lets compute a grid of mean 𝑣𝑦 and 𝑣𝑧 velocities. Notice that we are limiting the range of the 𝑣𝑦 and 𝑣𝑧 values to go between -20 and 20, and the grid resolution is 32x32 bins:

[15]:

limits = [-20, 20]
shape_vector = 32
mean_vy = df.mean("vy", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mean_vz = df.mean("vz", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')

Next, let us create a meshgrid to hold the centres of the bins:

[16]:

# create a 2d array with holds the center of the bins
centers = np.linspace(*limits, shape_vector, endpoint=False) + (limits[1] - limits[0])/2./shape_vector
z, y = np.meshgrid(centers, centers)

To keep the plot “clean”, we also do not want visualize the velocity of the bins with low number counts:

[17]:

# we don't want to show bins with low number of counts
counts = df.count(binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mask = counts.flatten( ) > 10

Finally we can plot a background density map of \(v_y\) vs \(v_z\), and then use plt.quiver to overplot the velocity vectors:

[18]:

df.viz.heatmap("y", "z", limits=limits, f="log1p", figsize=(10, 9), selection=[None, "stream"], shape=128)

# overplot the mean velocity vectors
plt.quiver(y.flatten()[mask],
           z.flatten()[mask],
           mean_vy.flatten()[mask],
           mean_vz.flatten()[mask],
           color="white",
           alpha=0.75);

../_images/guides_advanced_plotting_35_0.png

We indeed see that the stars we selected move together, and form a stream!

Plotting a healpix map¶

Healpix is made available via the healpy package. Vaex does not need special support for healpix, but some helper functions are introduced to make working with healpix easier.

Make sure you have healpy installed. If you do not, you can install it with one of these commands:

!pip install healpy  # if you prefer pip
!conda install -c conda-forge healpy if you are using a conda package manager

To understand this better, we will start from the beginning. If we want to make a density sky plot, we would like to pass to healpy a 1d numpy array where each value represents the density at a location of the sphere, where the location is determined by the array size (the healpix level) and the offset (the location).

This example uses a simulated Gaia dataset. The Gaia data includes the healpix index encoded in the source_id column. By diving source_id by 34359738368 you get a healpix index level 12, and diving it further will take you to lower levels.

Let us start by fetching the dataset (Note: the dataset is ~700MB on disk).

[19]:

import healpy as hp

[20]:

df = vaex.datasets.tgas(full=True)
df.head()

[20]:

#	astrometric_delta_q	astrometric_excess_noise	astrometric_excess_noise_sig	astrometric_n_bad_obs_ac	astrometric_n_bad_obs_al	astrometric_n_good_obs_ac	astrometric_n_good_obs_al	astrometric_n_obs_ac	astrometric_n_obs_al	astrometric_primary_flag	astrometric_priors_used	astrometric_relegation_factor	astrometric_weight_ac	astrometric_weight_al	b	dec	dec_error	dec_parallax_corr	dec_pmdec_corr	dec_pmra_corr	duplicated_source	ecl_lat	ecl_lon	hip	l	matched_observations	parallax	parallax_error	parallax_pmdec_corr	parallax_pmra_corr	phot_g_mean_flux	phot_g_mean_flux_error	phot_g_mean_mag	phot_g_n_obs	phot_variable_flag	pmdec	pmdec_error	pmra	pmra_error	pmra_pmdec_corr	ra	ra_dec_corr	ra_error	ra_parallax_corr	ra_pmdec_corr	ra_pmra_corr	random_index	ref_epoch	scan_direction_mean_k1	scan_direction_mean_k2	scan_direction_mean_k3	scan_direction_mean_k4	scan_direction_strength_k1	scan_direction_strength_k2	scan_direction_strength_k3	scan_direction_strength_k4	solution_id	source_id	tycho2_id
0	1.91906	0.717101	412.606	1	0	78	79	79	79	84	3	2.9361	1.26696e-05	1.81816	-48.7144	0.235392	0.218802	-0.407338	0.0606588	-0.0994513	70	-16.1211	42.6418	13989	176.74	9	6.35295	0.30791	-0.101957	-0.00157679	1.03123e+07	10577.4	7.99138	77	b'NOT_AVAILABLE'	-7.64199	0.0874018	43.7523	0.0705422	0.214677	45.0343	-0.414972	0.305989	0.179966	-0.0857597	0.159207	243619	2015	-113.76	21.3929	-41.6784	26.2018	0.382348	0.538266	0.392379	0.916306	1635378410781933568	7627862074752	b''
1	nan	0.253463	47.3163	2	0	55	57	57	57	84	5	2.65231	3.16002e-05	12.8616	-48.645	0.200068	1.19779	0.837626	-0.975644	0.972577	70	-16.193	42.7612	-2147483648	176.916	8	3.90033	0.323488	-0.853779	0.839739	949565	1140.17	10.581	62	b'NOT_AVAILABLE'	-55.1092	2.52293	10.0363	4.61141	-0.996399	45.165	-0.995923	2.58388	-0.860911	0.97348	-0.972417	487238	2015	-156.433	22.7661	-36.2397	22.8906	0.711003	0.96597	0.646115	0.86716	1635378410781933568	9277129363072	b'55-28-1'
2	nan	0.398901	221.185	4	1	57	60	61	61	84	5	3.9934	2.56339e-05	5.76753	-48.6678	0.248825	0.180326	-0.391891	-0.193256	0.0894205	70	-16.1234	42.6975	-2147483648	176.78	7	3.15531	0.273484	-0.118552	-0.0418587	817838	1827.38	10.7431	60	b'NOT_AVAILABLE'	-1.60287	1.03526	2.93228	1.90864	-0.914271	45.0862	-0.177443	0.213836	0.307722	-0.184817	0.0468668	1948952	2015	-117.008	19.7722	-43.1082	26.7157	0.482528	0.428758	0.524153	0.903062	1635378410781933568	13297218905216	b'55-1191-1'
3	nan	0.422492	179.982	1	0	51	52	52	52	84	5	4.21516	2.86726e-05	5.36086	-48.6824	0.248211	0.200958	-0.337217	-0.223501	0.131811	70	-16.1182	42.6778	-2147483648	176.76	7	2.29237	0.280972	-0.109202	-0.0494409	602053	905.877	11.0757	61	b'NOT_AVAILABLE'	-18.4149	1.12985	3.66198	2.06505	-0.926177	45.0665	-0.365707	0.276039	0.202878	-0.0589288	-0.0509089	102321	2015	-132.421	22.5693	-38.9545	25.8786	0.494655	0.638456	0.509074	0.898918	1635378410781933568	13469017597184	b'55-624-1'
4	nan	0.3175	119.748	2	3	85	84	87	87	84	5	3.23564	2.22788e-05	8.08078	-48.572	0.335044	0.17013	-0.438708	-0.279349	0.121792	70	-16.0555	42.7734	-2147483648	176.739	11	1.58208	0.261539	-0.329196	0.100312	1.38812e+06	2826.43	10.1687	96	b'NOT_AVAILABLE'	-2.37939	0.710632	0.340802	1.22048	-0.833604	45.136	-0.0490526	0.170697	0.471425	-0.156392	-0.152076	409284	2015	-106.86	4.4521	-47.8954	26.7555	0.520654	0.23931	0.653377	0.863385	1635378410781933568	15736760328576	b'55-849-1'
5	nan	0.303723	64.6868	2	1	68	69	70	70	84	5	3.10892	2.22511e-05	9.65279	-48.5511	0.359618	0.179848	-0.437142	-0.376402	0.257906	70	-16.0335	42.7861	-2147483648	176.718	9	8.66308	0.255867	-0.297309	0.0791063	1.66384e+06	1381.58	9.97199	76	b'NOT_AVAILABLE'	-72.7114	0.720852	-52.8493	1.26429	-0.852784	45.1414	-0.264588	0.205008	0.39493	0.102073	-0.36853	204642	2015	-127.824	16.3828	-44.2417	25.1631	0.522809	0.479366	0.621515	0.847412	1635378410781933568	16527034310784	b'55-182-1'
6	nan	0.340405	118.911	2	1	76	77	78	78	84	5	3.44745	2.19728e-05	7.91894	-48.5242	0.386343	0.17188	-0.341053	-0.34408	0.1516	70	-16.0114	42.8058	-2147483648	176.701	9	5.6982	0.263677	-0.367848	0.0846782	1.821e+06	2755.91	9.874	77	b'NOT_AVAILABLE'	-3.35036	0.707184	24.5272	1.17738	-0.800098	45.153	-0.0412512	0.189524	0.488929	-0.163855	-0.195289	540954	2015	-114.478	11.0431	-46.4507	26.2651	0.512088	0.322961	0.637399	0.856398	1635378410781933568	16733192740608	b'55-867-1'
7	nan	0.253709	88.6261	3	0	76	79	79	79	84	5	2.65453	2.57372e-05	13.709	-48.5569	0.380844	0.150943	-0.139315	-0.358996	0.238914	70	-16.0049	42.7641	-2147483648	176.665	10	2.09081	0.222206	-0.277202	0.093748	967144	601.802	10.561	87	b'NOT_AVAILABLE'	-11.6616	0.982994	-1.57293	1.73319	-0.904223	45.1128	-0.187136	0.206981	0.412381	0.0994892	-0.284353	1081909	2015	-88.3027	14.7861	-47.9744	27.0228	0.39079	0.333692	0.400387	0.90071	1635378410781933568	16870631694208	b'55-72-1'
8	nan	0.401473	226.044	3	1	69	71	72	72	84	5	4.01755	2.45771e-05	5.41389	-48.6511	0.351099	0.169345	-0.276625	-0.175754	0.101633	70	-16.0034	42.6531	-2147483648	176.589	9	6.20249	0.247253	-0.139338	0.0669677	1.66582e+06	1233.43	9.9707	79	b'NOT_AVAILABLE'	9.19541	1.02832	26.308	2.03485	-0.905496	45.0103	-0.321544	0.243576	0.263603	-0.143727	0.107397	589318	2015	-106.23	19.3449	-44.7095	25.5226	0.335982	0.520842	0.35827	0.90504	1635378410781933568	26834955821312	b'55-912-1'
9	nan	0.235866	49.3216	2	0	51	53	53	53	84	5	2.49518	2.42543e-05	15.7304	-48.5912	0.473472	0.163531	-0.0605532	-0.242013	0.14566	70	-15.8759	42.6549	-2147483648	176.419	8	1.67767	0.222067	-0.18584	0.0668122	1.96682e+06	1184.17	9.79036	62	b'NOT_AVAILABLE'	-24.5264	1.1319	9.10421	2.20939	-0.92529	44.9747	-0.407078	0.267911	0.236157	-0.0912424	0.0305957	1178636	2015	-99.9696	19.5819	-46.0718	24.0416	0.217998	0.655547	0.219464	0.892649	1635378410781933568	33260226885120	b'48-1139-1'

Let’s plot a healpix figure of level 2. We can start by counting the number of stars in each healpix region:

[21]:

level = 2
factor = 34359738368 * (4**(12-level))
nmax = hp.nside2npix(2**level)
counts = df.count(binby="source_id/" + str(factor), limits=[0, nmax], shape=nmax)
counts

[21]:

array([ 4021,  6171,  5318,  7114,  5755, 13420, 12711, 10193,  7782,
       14187, 12578, 22038, 17313, 13064, 17298, 11887,  3859,  3488,
        9036,  5533,  4007,  3899,  4884,  5664, 10741,  7678, 12092,
       10182,  6652,  6793, 10117,  9614,  3727,  5849,  4028,  5505,
        8462, 10059,  6581,  8282,  4757,  5116,  4578,  5452,  6023,
        8340,  6440,  8623,  7308,  6197, 21271, 23176, 12975, 17138,
       26783, 30575, 31931, 29697, 17986, 16987, 19802, 15632, 14273,
       10594,  4807,  4551,  4028,  4357,  4067,  4206,  3505,  4137,
        3311,  3582,  3586,  4218,  4529,  4360,  6767,  7579, 14462,
       24291, 10638, 11250, 29619,  9678, 23322, 18205,  7625,  9891,
        5423,  5808, 14438, 17251,  7833, 15226,  7123,  3708,  6135,
        4110,  3587,  3222,  3074,  3941,  3846,  3402,  3564,  3425,
        4125,  4026,  3689,  4084, 16617, 13577,  6911,  4837, 13553,
       10074,  9534, 20824,  4976,  6707,  5396,  8366, 13494, 19766,
       11012, 16130,  8521,  8245,  6871,  5977,  8789, 10016,  6517,
        8019,  6122,  5465,  5414,  4934,  5788,  6139,  4310,  4144,
       11437, 30731, 13741, 27285, 40227, 16320, 23039, 10812, 14686,
       27690, 15155, 32701, 18780,  5895, 23348,  6081, 17050, 28498,
       35232, 26223, 22341, 15867, 17688,  8580, 24895, 13027, 11223,
        7880,  8386,  6988,  5815,  4717,  9088,  8283, 12059,  9161,
        6952,  4914,  6652,  4666, 12014, 10703, 16518, 10270,  6724,
        4553,  9282,  4981])

Using the healpy package, we can plot this in a molleweide projection

[22]:

hp.mollview(counts, nest=True);

../_images/guides_advanced_plotting_44_0.png

To avoid tying the above code all over again, we can use the df.healpix_count method instead:

[23]:

counts = df.healpix_count(healpix_level=6)
hp.mollview(counts, nest=True)

../_images/guides_advanced_plotting_46_0.png

Instead of using healpy, we can use vaex’ df.viz.healpix_plot method:

[24]:

df.viz.healpix_heatmap(f="log1p", healpix_level=6, figsize=(10,8), healpix_output="ecliptic")

../_images/guides_advanced_plotting_48_0.png

Guides Arrow

vaex 4.17.0 documentation