{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Caching\n", "\n", "Vaex can cache task results, such as aggregations, or the internal hashmaps used for `groupby` operations to make recurring calculations much faster, at the cost of calculating cache keys and storing/retrieving the cached values.\n", "\n", "Internally, Vaex calculates fingerprints (e.g. hashes of data, or file paths and mtimes) to create cache keys that are similar across processes, so that a restart of a process will most likely result in similar hash keys.\n", "\n", "[See configuration of the cache.](../conf.md#cache)\n", "\n", "Caches can be turned on globally like this:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import vaex\n", "df = vaex.datasets.titanic()\n", "vaex.cache.memory(); # cache on globally" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can verify that the cache is turned on via:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vaex.cache.is_on()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cache can be globally turned off again:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vaex.cache.off()\n", "vaex.cache.is_on()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cache can also be turned on with a context manager, after which it will be turned off again. Here we use a disk cache. Disk cache is shared among processes, and is ideal for processes that restart, or when using Vaex in a web service with multiple workers. Consider the following example:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "29.8811345124283\n" ] } ], "source": [ "with vaex.cache.disk(clear=True):\n", " print(df.age.mean()) # The very first time the mean is computed" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# outside of the context manager, the cache is still off\n", "vaex.cache.is_on()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "29.8811345124283\n" ] } ], "source": [ "with vaex.cache.disk():\n", " print(df.age.mean()) # The second time the result is read from the cache" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vaex.cache.is_on()" ] } ], "metadata": { "interpreter": { "hash": "2b337e1aa502f5cea9a92c761ad75d3ab5045107ee3446fdbe7f873d4f1936e7" }, "kernelspec": { "display_name": "Python 3.8.5 64-bit ('base': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }