{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "**Warning:** This notebook needs a running kernel to be fully interactive, please run it locally or on [mybinder](https://mybinder.org/v2/gh/vaexio/vaex/master?filepath=docs%2Fsource%2Ftutorial_jupyter.ipynb).\n", "\n", "
\n", "\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/vaexio/vaex/latest?filepath=docs%2Fsource%2Ftutorial_jupyter.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Jupyter integration: interactivity\n", "\n", "Vaex can process about 1 billion rows per second, and in combination with the Jupyter notebook, this allows for interactive exporation of large datasets.\n", "\n", "## Introduction\n", "The `vaex-jupyter` package contains the building blocks to interactively define an N-dimensional grid, which is then used for visualizations.\n", "\n", "We start by defining the building blocks (`vaex.jupyter.model.Axis`, `vaex.jupyter.model.DataArray` and `vaex.jupyter.view.DataArray`) used to define and visualize our N-dimensional grid.\n", "\n", "Let us first import the relevant packages, and open the example DataFrame:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:43.143048Z", "start_time": "2020-05-17T14:54:40.868508Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
# id x y z vx vy vz E L Lz FeH
0 0 1.2318683862686157 -0.39692866802215576-0.598057746887207 301.1552734375 174.05947875976562 27.42754554748535 -149431.40625 407.38897705078125333.9555358886719 -1.0053852796554565
1 23 -0.163700610399246223.654221296310425 -0.25490644574165344-195.00022888183594170.47216796875 142.5302276611328 -124247.953125890.2411499023438 684.6676025390625 -1.7086670398712158
2 32 -2.120255947113037 3.326052665710449 1.7078403234481812 -48.63423156738281 171.6472930908203 -2.079437255859375 -138500.546875372.2410888671875 -202.17617797851562-1.8336141109466553
3 8 4.7155890464782715 4.5852508544921875 2.2515437602996826 -232.42083740234375-294.850830078125 62.85865020751953 -60037.03906251297.63037109375 -324.6875 -1.4786882400512695
4 16 7.21718692779541 11.99471664428711 -1.064562201499939 -1.6891745328903198181.329345703125 -11.333610534667969-83206.84375 1332.79895019531251328.948974609375 -1.8570483922958374
... ... ... ... ... ... ... ... ... ... ... ...
329,99521 1.9938701391220093 0.789276123046875 0.22205990552902222 -216.9299011230468816.124420166015625 -211.244384765625 -146457.4375 457.72247314453125203.36758422851562 -1.7451677322387695
329,99625 3.7180912494659424 0.721337616443634 1.6415337324142456 -185.92160034179688-117.25082397460938-105.4986572265625 -126627.109375335.0025634765625 -301.8370056152344 -0.9822322130203247
329,99714 0.3688507676124573 13.029608726501465 -3.633934736251831 -53.677146911621094-145.15771484375 76.70909881591797 -84912.2578125817.1375732421875 645.8507080078125 -1.7645612955093384
329,99818 -0.112592644989490511.4529125690460205 2.168952703475952 179.30865478515625 205.79710388183594 -68.75872802734375 -133498.46875 724.000244140625 -283.6910400390625 -1.8808952569961548
329,9994 20.796220779418945 -3.331387758255005 12.18841552734375 42.69000244140625 69.20479583740234 29.54275131225586 -65519.328125 1843.07470703125 1581.4151611328125 -1.1231083869934082
" ], "text/plain": [ "# id x y z vx vy vz E L Lz FeH\n", "0 0 1.2318683862686157 -0.39692866802215576 -0.598057746887207 301.1552734375 174.05947875976562 27.42754554748535 -149431.40625 407.38897705078125 333.9555358886719 -1.0053852796554565\n", "1 23 -0.16370061039924622 3.654221296310425 -0.25490644574165344 -195.00022888183594 170.47216796875 142.5302276611328 -124247.953125 890.2411499023438 684.6676025390625 -1.7086670398712158\n", "2 32 -2.120255947113037 3.326052665710449 1.7078403234481812 -48.63423156738281 171.6472930908203 -2.079437255859375 -138500.546875 372.2410888671875 -202.17617797851562 -1.8336141109466553\n", "3 8 4.7155890464782715 4.5852508544921875 2.2515437602996826 -232.42083740234375 -294.850830078125 62.85865020751953 -60037.0390625 1297.63037109375 -324.6875 -1.4786882400512695\n", "4 16 7.21718692779541 11.99471664428711 -1.064562201499939 -1.6891745328903198 181.329345703125 -11.333610534667969 -83206.84375 1332.7989501953125 1328.948974609375 -1.8570483922958374\n", "... ... ... ... ... ... ... ... ... ... ... ...\n", "329,995 21 1.9938701391220093 0.789276123046875 0.22205990552902222 -216.92990112304688 16.124420166015625 -211.244384765625 -146457.4375 457.72247314453125 203.36758422851562 -1.7451677322387695\n", "329,996 25 3.7180912494659424 0.721337616443634 1.6415337324142456 -185.92160034179688 -117.25082397460938 -105.4986572265625 -126627.109375 335.0025634765625 -301.8370056152344 -0.9822322130203247\n", "329,997 14 0.3688507676124573 13.029608726501465 -3.633934736251831 -53.677146911621094 -145.15771484375 76.70909881591797 -84912.2578125 817.1375732421875 645.8507080078125 -1.7645612955093384\n", "329,998 18 -0.11259264498949051 1.4529125690460205 2.168952703475952 179.30865478515625 205.79710388183594 -68.75872802734375 -133498.46875 724.000244140625 -283.6910400390625 -1.8808952569961548\n", "329,999 4 20.796220779418945 -3.331387758255005 12.18841552734375 42.69000244140625 69.20479583740234 29.54275131225586 -65519.328125 1843.07470703125 1581.4151611328125 -1.1231083869934082" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import vaex\n", "import vaex.jupyter.model as vjm\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "df = vaex.example()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to build a 2 dimensinoal grid with the number counts in each bin. To do this, we first define two axis objects:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:43.154668Z", "start_time": "2020-05-17T14:54:43.145831Z" } }, "outputs": [ { "data": { "text/plain": [ "Axis(bin_centers=None, exception=None, expression=Lz, max=None, min=None, shape=100, shape_default=64, slice=None, status=Status.NO_LIMITS)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E_axis = vjm.Axis(df=df, expression=df.E, shape=140)\n", "Lz_axis = vjm.Axis(df=df, expression=df.Lz, shape=100)\n", "Lz_axis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we inspect the `Lz_axis` object we see that the `min`, `max`, and `bin centers` are all `None`. This is because Vaex calculates them in the background, so the kernel stays interactive, meaning you can continue working in the notebook. We can ask Vaex to wait until all background calculations are done. Note that for billions of rows, this can take over a second. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:43.822027Z", "start_time": "2020-05-17T14:54:43.156785Z" } }, "outputs": [ { "data": { "text/plain": [ "Axis(bin_centers=[-2877.11808899 -2830.27174744 -2783.42540588 -2736.57906433\n", " -2689.73272278 -2642.88638123 -2596.04003967 -2549.19369812\n", " -2502.34735657 -2455.50101501 -2408.65467346 -2361.80833191\n", " -2314.96199036 -2268.1156488 -2221.26930725 -2174.4229657\n", " -2127.57662415 -2080.73028259 -2033.88394104 -1987.03759949\n", " -1940.19125793 -1893.34491638 -1846.49857483 -1799.65223328\n", " -1752.80589172 -1705.95955017 -1659.11320862 -1612.26686707\n", " -1565.42052551 -1518.57418396 -1471.72784241 -1424.88150085\n", " -1378.0351593 -1331.18881775 -1284.3424762 -1237.49613464\n", " -1190.64979309 -1143.80345154 -1096.95710999 -1050.11076843\n", " -1003.26442688 -956.41808533 -909.57174377 -862.72540222\n", " -815.87906067 -769.03271912 -722.18637756 -675.34003601\n", " -628.49369446 -581.64735291 -534.80101135 -487.9546698\n", " -441.10832825 -394.26198669 -347.41564514 -300.56930359\n", " -253.72296204 -206.87662048 -160.03027893 -113.18393738\n", " -66.33759583 -19.49125427 27.35508728 74.20142883\n", " 121.04777039 167.89411194 214.74045349 261.58679504\n", " 308.4331366 355.27947815 402.1258197 448.97216125\n", " 495.81850281 542.66484436 589.51118591 636.35752747\n", " 683.20386902 730.05021057 776.89655212 823.74289368\n", " 870.58923523 917.43557678 964.28191833 1011.12825989\n", " 1057.97460144 1104.82094299 1151.66728455 1198.5136261\n", " 1245.35996765 1292.2063092 1339.05265076 1385.89899231\n", " 1432.74533386 1479.59167542 1526.43801697 1573.28435852\n", " 1620.13070007 1666.97704163 1713.82338318 1760.66972473], exception=None, expression=Lz, max=1784.0928955078125, min=-2900.541259765625, shape=100, shape_default=64, slice=None, status=Status.READY)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await vaex.jupyter.gather() # wait until Vaex is done with all background computation\n", "Lz_axis # now min and max are computed, and bin_centers is set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the Axis is a [traitlets HasTrait object](https://traitlets.readthedocs.io), similar to all ipywidget objects. This means that we can link all of its properties to an ipywidget and thus creating interactivity. We can also use [observe](https://traitlets.readthedocs.io/en/stable/using_traitlets.html#observe) to listen to any changes to our model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## An interactive xarray DataArray display\n", "\n", "Now that we have defined our two axes, we can create a [vaex.jupyter.model.DataArray](api.html#vaex.jupyter.model.DataArray) (model) together with a [vaex.jupyter.view.DataArray](api.html#vaex.jupyter.view.DataArray) (view).\n", "\n", "A convenient way to do this, is to use the [widget accessor](api.html#vaex.jupyter.DataFrameAccessorWidget) `data_array` method, which creates both, links them together and will return a view for us.\n", "\n", "The returned view is an ipywidget object, which becomes a visual element in the Jupyter notebook when displayed." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.027604Z", "start_time": "2020-05-17T14:54:43.824470Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1e2a3fe8d49746e980e022f9a835177c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data_array_widget = df.widget.data_array(axes=[Lz_axis, E_axis], selection=[None, 'default'])\n", "data_array_widget # being the last expression in the cell, Jupyter will 'display' the widget" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: If you see this notebook on readthedocs, you will see the selection coordinate already has `[None, 'default']`, because cells below have already been executed and have updated this widget. If you run this notebook yourself (say on mybinder), you will see after executing the above cell, the selection will have `[None]` as its only value.*\n", "\n", "From the specification of the axes and the selections, Vaex computes a 3d histogram, the first dimension being the selections. Interally this is simply a numpy array, but we wrap it in an [xarray](http://xarray.pydata.org/) [DataArray](http://xarray.pydata.org/en/stable/data-structures.html#dataarray) object. An xarray DataArray object can be seen as a labeled Nd array, i.e. a numpy array with extra metadata to make it fully self-describing.\n", "\n", "Notice that in the above code cell, we specified the `selection` argument with a list containing two elements in this case, `None` and `'default'`. The `None` selection simply shows all the data, while the `default` refers to any selection made without explicitly naming it. Even though the later has not been defined at this point, we can still pre-emptively include it, in case we want to modify it later.\n", "\n", "The most important properties of the `data_array` are printed out below:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.663121Z", "start_time": "2020-05-17T14:54:44.029548Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "type: \n", "dims: ('selection', 'Lz', 'E')\n", "data: [[[0 0 0 ... 0 0 0]\n", " [0 0 0 ... 0 0 0]\n", " [0 0 0 ... 0 0 0]\n", " ...\n", " [0 0 0 ... 0 0 0]\n", " [0 0 0 ... 0 0 0]\n", " [0 0 0 ... 0 0 0]]]\n", "coords: Coordinates:\n", " * selection (selection) object None\n", " * Lz (Lz) float64 -2.877e+03 -2.83e+03 ... 1.714e+03 1.761e+03\n", " * E (E) float64 -2.414e+05 -2.394e+05 ... 3.296e+04 3.495e+04\n", "Lz's data: [-2877.11808899 -2830.27174744 -2783.42540588 -2736.57906433\n", " -2689.73272278 -2642.88638123 -2596.04003967 -2549.19369812\n", " -2502.34735657 -2455.50101501 -2408.65467346 -2361.80833191\n", " -2314.96199036 -2268.1156488 -2221.26930725 -2174.4229657\n", " -2127.57662415 -2080.73028259 -2033.88394104 -1987.03759949\n", " -1940.19125793 -1893.34491638 -1846.49857483 -1799.65223328\n", " -1752.80589172 -1705.95955017 -1659.11320862 -1612.26686707\n", " -1565.42052551 -1518.57418396 -1471.72784241 -1424.88150085\n", " -1378.0351593 -1331.18881775 -1284.3424762 -1237.49613464\n", " -1190.64979309 -1143.80345154 -1096.95710999 -1050.11076843\n", " -1003.26442688 -956.41808533 -909.57174377 -862.72540222\n", " -815.87906067 -769.03271912 -722.18637756 -675.34003601\n", " -628.49369446 -581.64735291 -534.80101135 -487.9546698\n", " -441.10832825 -394.26198669 -347.41564514 -300.56930359\n", " -253.72296204 -206.87662048 -160.03027893 -113.18393738\n", " -66.33759583 -19.49125427 27.35508728 74.20142883\n", " 121.04777039 167.89411194 214.74045349 261.58679504\n", " 308.4331366 355.27947815 402.1258197 448.97216125\n", " 495.81850281 542.66484436 589.51118591 636.35752747\n", " 683.20386902 730.05021057 776.89655212 823.74289368\n", " 870.58923523 917.43557678 964.28191833 1011.12825989\n", " 1057.97460144 1104.82094299 1151.66728455 1198.5136261\n", " 1245.35996765 1292.2063092 1339.05265076 1385.89899231\n", " 1432.74533386 1479.59167542 1526.43801697 1573.28435852\n", " 1620.13070007 1666.97704163 1713.82338318 1760.66972473]\n", "Lz's attrs: {'min': -2900.541259765625, 'max': 1784.0928955078125}\n", "And displaying the xarray DataArray:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "Show/Hide data repr\n", "\n", "\n", "\n", "\n", "\n", "Show/Hide attributes\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
xarray.DataArray
  • selection: 1
  • Lz: 100
  • E: 140
  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    array([[[0, 0, 0, ..., 0, 0, 0],\n",
           "        [0, 0, 0, ..., 0, 0, 0],\n",
           "        [0, 0, 0, ..., 0, 0, 0],\n",
           "        ...,\n",
           "        [0, 0, 0, ..., 0, 0, 0],\n",
           "        [0, 0, 0, ..., 0, 0, 0],\n",
           "        [0, 0, 0, ..., 0, 0, 0]]])
    • selection
      (selection)
      object
      None
      array([None], dtype=object)
    • Lz
      (Lz)
      float64
      -2.877e+03 -2.83e+03 ... 1.761e+03
      min :
      -2900.541259765625
      max :
      1784.0928955078125
      array([-2877.118089, -2830.271747, -2783.425406, -2736.579064, -2689.732723,\n",
             "       -2642.886381, -2596.04004 , -2549.193698, -2502.347357, -2455.501015,\n",
             "       -2408.654673, -2361.808332, -2314.96199 , -2268.115649, -2221.269307,\n",
             "       -2174.422966, -2127.576624, -2080.730283, -2033.883941, -1987.037599,\n",
             "       -1940.191258, -1893.344916, -1846.498575, -1799.652233, -1752.805892,\n",
             "       -1705.95955 , -1659.113209, -1612.266867, -1565.420526, -1518.574184,\n",
             "       -1471.727842, -1424.881501, -1378.035159, -1331.188818, -1284.342476,\n",
             "       -1237.496135, -1190.649793, -1143.803452, -1096.95711 , -1050.110768,\n",
             "       -1003.264427,  -956.418085,  -909.571744,  -862.725402,  -815.879061,\n",
             "        -769.032719,  -722.186378,  -675.340036,  -628.493694,  -581.647353,\n",
             "        -534.801011,  -487.95467 ,  -441.108328,  -394.261987,  -347.415645,\n",
             "        -300.569304,  -253.722962,  -206.87662 ,  -160.030279,  -113.183937,\n",
             "         -66.337596,   -19.491254,    27.355087,    74.201429,   121.04777 ,\n",
             "         167.894112,   214.740453,   261.586795,   308.433137,   355.279478,\n",
             "         402.12582 ,   448.972161,   495.818503,   542.664844,   589.511186,\n",
             "         636.357527,   683.203869,   730.050211,   776.896552,   823.742894,\n",
             "         870.589235,   917.435577,   964.281918,  1011.12826 ,  1057.974601,\n",
             "        1104.820943,  1151.667285,  1198.513626,  1245.359968,  1292.206309,\n",
             "        1339.052651,  1385.898992,  1432.745334,  1479.591675,  1526.438017,\n",
             "        1573.284359,  1620.1307  ,  1666.977042,  1713.823383,  1760.669725])
    • E
      (E)
      float64
      -2.414e+05 -2.394e+05 ... 3.495e+04
      min :
      -242407.5
      max :
      35941.86328125
      array([-241413.395131, -239425.185393, -237436.975656, -235448.765918,\n",
             "       -233460.55618 , -231472.346443, -229484.136705, -227495.926967,\n",
             "       -225507.717229, -223519.507492, -221531.297754, -219543.088016,\n",
             "       -217554.878278, -215566.668541, -213578.458803, -211590.249065,\n",
             "       -209602.039328, -207613.82959 , -205625.619852, -203637.410114,\n",
             "       -201649.200377, -199660.990639, -197672.780901, -195684.571164,\n",
             "       -193696.361426, -191708.151688, -189719.94195 , -187731.732213,\n",
             "       -185743.522475, -183755.312737, -181767.102999, -179778.893262,\n",
             "       -177790.683524, -175802.473786, -173814.264049, -171826.054311,\n",
             "       -169837.844573, -167849.634835, -165861.425098, -163873.21536 ,\n",
             "       -161885.005622, -159896.795884, -157908.586147, -155920.376409,\n",
             "       -153932.166671, -151943.956934, -149955.747196, -147967.537458,\n",
             "       -145979.32772 , -143991.117983, -142002.908245, -140014.698507,\n",
             "       -138026.48877 , -136038.279032, -134050.069294, -132061.859556,\n",
             "       -130073.649819, -128085.440081, -126097.230343, -124109.020605,\n",
             "       -122120.810868, -120132.60113 , -118144.391392, -116156.181655,\n",
             "       -114167.971917, -112179.762179, -110191.552441, -108203.342704,\n",
             "       -106215.132966, -104226.923228, -102238.713491, -100250.503753,\n",
             "        -98262.294015,  -96274.084277,  -94285.87454 ,  -92297.664802,\n",
             "        -90309.455064,  -88321.245326,  -86333.035589,  -84344.825851,\n",
             "        -82356.616113,  -80368.406376,  -78380.196638,  -76391.9869  ,\n",
             "        -74403.777162,  -72415.567425,  -70427.357687,  -68439.147949,\n",
             "        -66450.938211,  -64462.728474,  -62474.518736,  -60486.308998,\n",
             "        -58498.099261,  -56509.889523,  -54521.679785,  -52533.470047,\n",
             "        -50545.26031 ,  -48557.050572,  -46568.840834,  -44580.631097,\n",
             "        -42592.421359,  -40604.211621,  -38616.001883,  -36627.792146,\n",
             "        -34639.582408,  -32651.37267 ,  -30663.162932,  -28674.953195,\n",
             "        -26686.743457,  -24698.533719,  -22710.323982,  -20722.114244,\n",
             "        -18733.904506,  -16745.694768,  -14757.485031,  -12769.275293,\n",
             "        -10781.065555,   -8792.855818,   -6804.64608 ,   -4816.436342,\n",
             "         -2828.226604,    -840.016867,    1148.192871,    3136.402609,\n",
             "          5124.612347,    7112.822084,    9101.031822,   11089.24156 ,\n",
             "         13077.451297,   15065.661035,   17053.870773,   19042.080511,\n",
             "         21030.290248,   23018.499986,   25006.709724,   26994.919461,\n",
             "         28983.129199,   30971.338937,   32959.548675,   34947.758412])
" ], "text/plain": [ "\n", "array([[[0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " ...,\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0]]])\n", "Coordinates:\n", " * selection (selection) object None\n", " * Lz (Lz) float64 -2.877e+03 -2.83e+03 ... 1.714e+03 1.761e+03\n", " * E (E) float64 -2.414e+05 -2.394e+05 ... 3.296e+04 3.495e+04" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# NOTE: since the computations are done in the background, data_array_widget.model.grid is initially None.\n", "# We can ask vaex-jupyter to wait till all executions are done using:\n", "await vaex.jupyter.gather()\n", "# get a reference to the xarray DataArray object\n", "data_array = data_array_widget.model.grid\n", "print(f\"type:\", type(data_array))\n", "print(\"dims:\", data_array.dims)\n", "print(\"data:\", data_array.data)\n", "print(\"coords:\", data_array.coords)\n", "print(\"Lz's data:\", data_array.coords['Lz'].data)\n", "print(\"Lz's attrs:\", data_array.coords['Lz'].attrs)\n", "print(\"And displaying the xarray DataArray:\")\n", "display(data_array) # this is what the vaex.jupyter.view.DataArray uses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `data_array.coords['Lz'].data` is the same as `Lz_axis.bin_centers` and `data_array.coords['Lz'].attrs` contains the same `min/max` as the `Lz_axis`.\n", "\n", "Also, we see that displaying the xarray.DataArray object (`data_array_view.model.grid`) gives us the same output as the `data_array_view` above. There is a big difference however. If we change a selection:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.671194Z", "start_time": "2020-05-17T14:54:44.665221Z" } }, "outputs": [], "source": [ "df.select(df.x > 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and scroll back we see that the `data_array_view` widget has updated itself, and now contains two selections! This is a very powerful feature, that allows us to make interactive visualizations.\n", "\n", "\n", "## Interactive plots\n", "\n", "To make interactive plots we can pass a custom `display_function` to the `data_array_widget`. This will override the default notebook behaviour which is a call to `display(data_array_widget)`. In the following example we create a function that displays a matplotlib figure:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.736577Z", "start_time": "2020-05-17T14:54:44.673205Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2a9e4e3e794b4688b3f21674eb6f0471", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# NOTE: da is short for 'data array'\n", "def plot2d(da):\n", " plt.figure(figsize=(8, 8))\n", " ar = da.data[1] # take the numpy data, and select take the selection\n", " print(f'imshow of a numpy array of shape: {ar.shape}')\n", " plt.imshow(np.log1p(ar.T), origin='lower')\n", "\n", "df.widget.data_array(axes=[Lz_axis, E_axis], display_function=plot2d, selection=[None, True])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above figure, we choose index 1 along the selection axis, which referes to the `'default'` selection. Choosing an index of 0 would correspond to the `None` selection, and all the data would be displayed. If we now change the selection, the figure will update itself:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.744045Z", "start_time": "2020-05-17T14:54:44.739054Z" } }, "outputs": [], "source": [ "df.select(df.id < 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As xarray's DataArray is fully self describing, we can improve the plot by using the dimension names for labeling, and setting the extent of the figure's axes.\n", "\n", "Note that we don't need any information from the Axis objects created above, and in fact, we should not use them, since they may not be in sync with the xarray DataArray object. Later on, we will create a widget that will edit the Axis' expression. \n", "\n", "Our improved visualization with proper axes and labeling:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.822741Z", "start_time": "2020-05-17T14:54:44.746184Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "297b46f5ea794c16bed3c3e322153d53", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plot2d_with_labels(da):\n", " plt.figure(figsize=(8, 8))\n", " grid = da.data # take the numpy data\n", " dim_x = da.dims[0]\n", " dim_y = da.dims[1]\n", " plt.title(f'{dim_y} vs {dim_x} - shape: {grid.shape}')\n", " extent = [\n", " da.coords[dim_x].attrs['min'], da.coords[dim_x].attrs['max'],\n", " da.coords[dim_y].attrs['min'], da.coords[dim_y].attrs['max']\n", " ]\n", " plt.imshow(np.log1p(grid.T), origin='lower', extent=extent, aspect='auto')\n", " plt.xlabel(da.dims[0])\n", " plt.ylabel(da.dims[1])\n", "\n", "da_plot_view_nicer = df.widget.data_array(axes=[Lz_axis, E_axis], display_function=plot2d_with_labels)\n", "da_plot_view_nicer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create more sophisticated plots, for example one where we show all of the selections. Note that we can pre-emptively expect a selection and define it later:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.912611Z", "start_time": "2020-05-17T14:54:44.825210Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "081d7d92dfc44bdca8e0d0ef65e09111", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plot2d_with_selections(da):\n", " grid = da.data\n", " # Create 1 row and #selections of columns of matplotlib axes\n", " fig, axgrid = plt.subplots(1, grid.shape[0], sharey=True, squeeze=False)\n", " for selection_index, ax in enumerate(axgrid[0]):\n", " ax.imshow(np.log1p(grid[selection_index].T), origin='lower')\n", "\n", "df.widget.data_array(axes=[Lz_axis, E_axis], display_function=plot2d_with_selections,\n", " selection=[None, 'default', 'rest'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modifying a selection will update the figure." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:44.922954Z", "start_time": "2020-05-17T14:54:44.915835Z" } }, "outputs": [], "source": [ "df.select(df.id < 10) # select 10 objects\n", "df.select(df.id >= 10, name='rest') # and the rest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another advantage of using xarray is its excellent plotting capabilities. It handles a lot of the boring stuff like axis labeling, and also provides a nice interface for slicing the data even more.\n", "\n", "Let us introduce another axis, FeH (fun fact: FeH is a property of stars that tells us how much iron relative to hydrogen is contained in them, an idicator of their origin):" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:45.004826Z", "start_time": "2020-05-17T14:54:44.925080Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "92a520c6b371423ab80d4d28ba4ca8fb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "FeH_axis = vjm.Axis(df=df, expression='FeH', min=-3, max=1, shape=5)\n", "da_view = df.widget.data_array(axes=[E_axis, Lz_axis, FeH_axis], selection=[None, 'default'])\n", "da_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that we now have a 4 dimensional grid, which we would like to visualize.\n", "\n", "And [xarray's plot](http://xarray.pydata.org/en/stable/plotting.html#two-dimensions) make our life much easier:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:45.092144Z", "start_time": "2020-05-17T14:54:45.006801Z" }, "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e4a3816e53ae4e0398868b921df7ee37", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plot_with_xarray(da):\n", " da_log = np.log1p(da) # Note that an xarray DataArray is like a numpy array\n", " da_log.plot(x='Lz', y='E', col='FeH', row='selection', cmap='viridis')\n", "\n", "plot_view = df.widget.data_array([E_axis, Lz_axis, FeH_axis], display_function=plot_with_xarray,\n", " selection=[None, 'default', 'rest'])\n", "plot_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only have to tell xarray which axis it should map to which 'aesthetic', speaking in Grammar of Graphics terms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selection widgets\n", "Although we can change the selection in the notebook (e.g. `df.select(df.id > 20)`), if we create a dashboard ([using Voila](https://voila.readthedocs.io/en/stable/)) we cannot execute arbitrary code. Vaex-jupyter also comes with many widgets, and one of them is a `selection_expression` widget:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:45.107350Z", "start_time": "2020-05-17T14:54:45.094036Z" }, "scrolled": true }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "83a4c8102f3349f28d097e62bb510f58", "version_major": 2, "version_minor": 0 }, "text/plain": [ "ExpressionSelectionTextArea(label='Filter by custom expression', placeholder='Enter a custom (boolean) express…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "selection_widget = df.widget.selection_expression()\n", "selection_widget" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `counter_selection` creates a widget which keeps track of the number of rows in a selection. In this case we ask it to be 'lazy', which means that it will not cause extra passes over the data, but will ride along if some user action triggers a calculation." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:50.325540Z", "start_time": "2020-05-17T14:54:45.109571Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "faba0e9b76f2474997a18f96ad36c1bf", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Counter(characters=[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '9', '9', …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "await vaex.jupyter.gather()\n", "w = df.widget.counter_selection('default', lazy=True)\n", "w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Axis control widgets\n", "\n", "\n", "Let us create new axis objects using the same expressions as before, but give them more general names (x_axis and y_axis), because we want to change the expressions interactively." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:50.413172Z", "start_time": "2020-05-17T14:54:50.328549Z" }, "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c834de5879c14b9f9890335111b5b169", "version_major": 2, "version_minor": 0 }, "text/plain": [ "DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x_axis = vjm.Axis(df=df, expression=df.Lz)\n", "y_axis = vjm.Axis(df=df, expression=df.E)\n", "\n", "da_xy_view = df.widget.data_array(axes=[x_axis, y_axis], display_function=plot2d_with_labels, shape=180)\n", "da_xy_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we can change the expressions of the axes programmatically:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:53.908299Z", "start_time": "2020-05-17T14:54:50.415406Z" } }, "outputs": [], "source": [ "# wait for the previous plot to finish\n", "await vaex.jupyter.gather()\n", "# Change both the x and y axis\n", "x_axis.expression = np.log(df.x**2)\n", "y_axis.expression = df.y\n", "# Note that both assignment will create 1 computation in the background (minimal amount of passes over the data)\n", "await vaex.jupyter.gather()\n", "# vaex computed the new min/max, and the xarray DataArray\n", "# x_axis.min, x_axis.max, da_xy_view.model.grid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But, if we want to create a dashboard with Voila, we need to have a widget that controls them:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:53.927309Z", "start_time": "2020-05-17T14:54:53.910550Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d219c8174c7a4ac0982ae8953b8283df", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Expression(label='X axis', placeholder='Enter a custom expression', prepend_icon='functions', success_messages…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x_widget = df.widget.expression(x_axis.expression, label='X axis')\n", "x_widget" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This widget will allow us to edit an expression, which will be validated by Vaex. How do we 'link' the value of the widget to the axis expression? Because both the Axis as well as the `x_widget` are [HasTrait objects](https://traitlets.readthedocs.io/en/stable/using_traitlets.html), we can link their traits together: " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:53.942106Z", "start_time": "2020-05-17T14:54:53.929934Z" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from ipywidgets import link\n", "link((x_widget, 'value'), (x_axis, 'expression'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since this operation is so common, we can also directly pass the Axis object, and Vaex will set up the linking for us:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:53.969053Z", "start_time": "2020-05-17T14:54:53.943972Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ba60f577a35f45569e3cc2817b4dd002", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Expression(label='X axis', placeholder='Enter a custom expression', prepend_icon='functions', success_messages…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y_widget = df.widget.expression(y_axis, label='X axis')\n", "# vaex now does this for us, much shorter\n", "# link((y_widget, 'value'), (y_axis, 'expression'))\n", "y_widget" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:53.978462Z", "start_time": "2020-05-17T14:54:53.973241Z" } }, "outputs": [], "source": [ "await vaex.jupyter.gather() # lets wait again till all calculations are finished" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A nice container\n", "\n", "If you are familiar with the [ipyvuetify](https://github.com/mariobuikhuizen/ipyvuetify/) components, you can combine them to create very pretty widgets. Vaex-jupyter comes with a nice container:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:53.999633Z", "start_time": "2020-05-17T14:54:53.981064Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e17aff36f888456a943fd1075827de50", "version_major": 2, "version_minor": 0 }, "text/plain": [ "ContainerCard(controls=[Expression(label='X axis', placeholder='Enter a custom expression', prepend_icon='func…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from vaex.jupyter.widgets import ContainerCard\n", "\n", "ContainerCard(title='My plot',\n", " subtitle=\"using vaex-jupyter\",\n", " main=da_xy_view,\n", " controls=[x_widget, y_widget], show_controls=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can directly assign a Vaex expression to the `x_axis.expression`, or to `x_widget.value` since they are linked." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:54.010645Z", "start_time": "2020-05-17T14:54:54.001846Z" } }, "outputs": [], "source": [ "y_axis.expression = df.vx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interactive plots\n", "\n", "So far we have been using interactive widgets to control the axes in the view. The figure itself however was not interactive, and we could not have panned or zoomed for example.\n", "\n", "Vaex has a few builtin visualizations, most notably a heatmap and histogram using [bqplot](https://github.com/bqplot/bqplot/):" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2020-05-17T14:54:54.329424Z", "start_time": "2020-05-17T14:54:54.013805Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cec9ef04d0a34ce9ad551a690e5d0056", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Heatmap(children=[ToolsToolbar(interact_value=None, supports_normalize=False, template='