Configuration#

All settings in Vaex can be configured in a uniform way, based on Pydantic. From a Python runtime, configuration of settings can be done via the vaex.settings module.

import vaex
vaex.settings.main.thread_count = 10
vaex.settings.display.max_columns = 50

Via environmental variables:

$ VAEX_NUM_THREADS=10 VAEX_DISPLAY_MAX_COLUMNS=50 python myservice.py

Otherwise, values are obtained from a .env file using dotenv from the current working directory.

VAEX_NUM_THREADS=22
VAEX_CHUNK_SIZE_MIN=2048

Lastly, a global yaml file from $VAEX_PATH_HOME/.vaex/main.yaml is loaded (with last priority).

thread_count: 33
display:
  max_columns: 44
  max_rows: 20

If we now run vaex settings yaml, we see the effective settings as yaml output:

$ VAEX_NUM_THREADS=10 VAEX_DISPLAY_MAX_COLUMNS=50 vaex settings yaml
...
chunk:
  size: null
  size_min: 2048
  size_max: 1048576
display:
  max_columns: 50
  max_rows: 20
thread_count: 10
...

Developers#

When updating vaex/settings.py, run the vaex settings watch to generate this documentation below automatically when saving the file.

Schema#

A JSON schema can be generated using

$ vaex settings schema > vaex-settings.schema.json

Settings#

General settings for vaex

aliases#

Aliases to be used for vaex.open

Environmental variable: VAEX_ALIASES

Python settings vaex.settings.main.aliases

async#

How to run async code in the local executor

Environmental variable: VAEX_ASYNC

Example use:

$ VAEX_ASYNC=nest python myscript.py

Python settings vaex.settings.main.async_

Example use: vaex.settings.main.async_ = 'nest'

home#

Home directory for vaex, which defaults to $HOME/.vaex, If both $VAEX_HOME and $HOME are not defined, the current working directory is used. (Note that this setting cannot be configured from the vaex home directory itself).

Environmental variable: VAEX_HOME

Example use:

$ VAEX_HOME=/home/docs/.vaex python myscript.py

Python settings vaex.settings.main.home

Example use: vaex.settings.main.home = '/home/docs/.vaex'

mmap#

Experimental to turn off, will avoid using memory mapping if set to False

Environmental variable: VAEX_MMAP

Example use:

$ VAEX_MMAP=True python myscript.py

Python settings vaex.settings.main.mmap

Example use: vaex.settings.main.mmap = True

process_count#

Number of processes to use for multiprocessing (e.g. apply), defaults to thread_count setting

Environmental variable: VAEX_PROCESS_COUNT

Example use:

$ VAEX_PROCESS_COUNT=2 python myscript.py

Python settings vaex.settings.main.process_count

Example use: vaex.settings.main.process_count = 2

thread_count#

Number of threads to use for computations, defaults to multiprocessing.cpu_count()

Environmental variable: VAEX_NUM_THREADS

Example use:

$ VAEX_NUM_THREADS=2 python myscript.py

Python settings vaex.settings.main.thread_count

Example use: vaex.settings.main.thread_count = 2

thread_count_io#

Number of threads to use for IO, defaults to thread_count_io + 1

Environmental variable: VAEX_NUM_THREADS_IO

Example use:

$ VAEX_NUM_THREADS_IO=2 python myscript.py

Python settings vaex.settings.main.thread_count_io

Example use: vaex.settings.main.thread_count_io = 2

path_lock#

Directory to store lock files for vaex, which defaults to ${VAEX_HOME}/lock/, Due to possible race conditions lock files cannot be removed while processes using Vaex are running (on Unix systems).

Environmental variable: VAEX_LOCK

Example use:

$ VAEX_LOCK=/home/docs/.vaex/lock python myscript.py

Python settings vaex.settings.main.path_lock

Example use: vaex.settings.main.path_lock = '/home/docs/.vaex/lock'

Cache#

Setting for caching of computation or task results, see the API for more details.

type#

Type of cache, e.g. ‘memory_infinite’, ‘memory’, ‘disk’, ‘redis’, or a multilevel cache, e.g. ‘memory,disk’

Environmental variable: VAEX_CACHE

Python settings vaex.settings.cache.type

disk_size_limit#

Maximum size for cache on disk, e.g. 10GB, 500MB

Environmental variable: VAEX_CACHE_DISK_SIZE_LIMIT

Example use:

$ VAEX_CACHE_DISK_SIZE_LIMIT=10GB python myscript.py

Python settings vaex.settings.cache.disk_size_limit

Example use: vaex.settings.cache.disk_size_limit = '10GB'

memory_size_limit#

Maximum size for cache in memory, e.g. 1GB, 500MB

Environmental variable: VAEX_CACHE_MEMORY_SIZE_LIMIT

Example use:

$ VAEX_CACHE_MEMORY_SIZE_LIMIT=1GB python myscript.py

Python settings vaex.settings.cache.memory_size_limit

Example use: vaex.settings.cache.memory_size_limit = '1GB'

path#

Storage location for cache results. Defaults to ${VAEX_HOME}/cache

Environmental variable: VAEX_CACHE_PATH

Example use:

$ VAEX_CACHE_PATH=/home/docs/.vaex/cache python myscript.py

Python settings vaex.settings.cache.path

Example use: vaex.settings.cache.path = '/home/docs/.vaex/cache'

Chunk#

Configure how a dataset is broken down in smaller chunks. The executor dynamically adjusts the chunk size based on size_min and size_max and the number of threads when size is not set.

size#

When set, fixes the number of chunks, e.g. do not dynamically adjust between min and max

Environmental variable: VAEX_CHUNK_SIZE

Python settings vaex.settings.main.chunk.size

size_min#

Minimum chunk size

Environmental variable: VAEX_CHUNK_SIZE_MIN

Example use:

$ VAEX_CHUNK_SIZE_MIN=1024 python myscript.py

Python settings vaex.settings.main.chunk.size_min

Example use: vaex.settings.main.chunk.size_min = 1024

size_max#

Maximum chunk size

Environmental variable: VAEX_CHUNK_SIZE_MAX

Example use:

$ VAEX_CHUNK_SIZE_MAX=1048576 python myscript.py

Python settings vaex.settings.main.chunk.size_max

Example use: vaex.settings.main.chunk.size_max = 1048576

Data#

Data configuration

path#

Storage location for data files, like vaex.example(). Defaults to ${VAEX_HOME}/data/

Environmental variable: VAEX_DATA_PATH

Example use:

$ VAEX_DATA_PATH=/home/docs/.vaex/data python myscript.py

Python settings vaex.settings.data.path

Example use: vaex.settings.data.path = '/home/docs/.vaex/data'

Display#

How a dataframe displays

max_columns#

How many column to display when printing out a dataframe

Environmental variable: VAEX_DISPLAY_MAX_COLUMNS

Example use:

$ VAEX_DISPLAY_MAX_COLUMNS=200 python myscript.py

Python settings vaex.settings.display.max_columns

Example use: vaex.settings.display.max_columns = 200

max_rows#

How many rows to print out before showing the first and last rows

Environmental variable: VAEX_DISPLAY_MAX_ROWS

Example use:

$ VAEX_DISPLAY_MAX_ROWS=10 python myscript.py

Python settings vaex.settings.display.max_rows

Example use: vaex.settings.display.max_rows = 10

FileSystem#

Filesystem configuration

path#

Storage location for caching files from remote file systems. Defaults to ${VAEX_HOME}/file-cache/

Environmental variable: VAEX_FS_PATH

Example use:

$ VAEX_FS_PATH=/home/docs/.vaex/file-cache python myscript.py

Python settings vaex.settings.fs.path

Example use: vaex.settings.fs.path = '/home/docs/.vaex/file-cache'

MemoryTracker#

Memory tracking/protection when using vaex in a service

type#

Which memory tracker to use when executing tasks

Environmental variable: VAEX_MEMORY_TRACKER

Example use:

$ VAEX_MEMORY_TRACKER=default python myscript.py

Python settings vaex.settings.main.memory_tracker.type

Example use: vaex.settings.main.memory_tracker.type = 'default'

max#

How much memory the executor can use maximally (only used for type=’limit’)

Environmental variable: VAEX_MEMORY_TRACKER_MAX

Python settings vaex.settings.main.memory_tracker.max

TaskTracker#

task tracking/protection when using vaex in a service

type#

Comma seperated string of trackers to run while executing tasks

Environmental variable: VAEX_TASK_TRACKER

Example use:

$ VAEX_TASK_TRACKER= python myscript.py

Python settings vaex.settings.main.task_tracker.type

Logging#

Configure logging for Vaex. By default Vaex sets up logging, which is useful when running a script. When Vaex is used in applications or services that already configure logging, set the environomental variables VAEX_LOGGING_SETUP to false.

See the API docs for more details.

Note that settings vaex.settings.main.logging.info etc at runtime, has no direct effect, since logging is already configured. When needed, call vaex.logging.reset() and vaex.logging.setup() to reconfigure logging.

setup#

Setup logging for Vaex at import time.

Environmental variable: VAEX_LOGGING_SETUP

Example use:

$ VAEX_LOGGING_SETUP=True python myscript.py

Python settings vaex.settings.main.logging.setup

Example use: vaex.settings.main.logging.setup = True

rich#

Use rich logger (colored fancy output).

Environmental variable: VAEX_LOGGING_RICH

Example use:

$ VAEX_LOGGING_RICH=True python myscript.py

Python settings vaex.settings.main.logging.rich

Example use: vaex.settings.main.logging.rich = True

debug#

Comma seperated list of loggers to set to the debug level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_DEBUG

Example use:

$ VAEX_LOGGING_DEBUG= python myscript.py

Python settings vaex.settings.main.logging.debug

info#

Comma seperated list of loggers to set to the info level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_INFO

Example use:

$ VAEX_LOGGING_INFO= python myscript.py

Python settings vaex.settings.main.logging.info

warning#

Comma seperated list of loggers to set to the warning level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_WARNING

Example use:

$ VAEX_LOGGING_WARNING=vaex python myscript.py

Python settings vaex.settings.main.logging.warning

Example use: vaex.settings.main.logging.warning = 'vaex'

error#

Comma seperated list of loggers to set to the error level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_ERROR

Example use:

$ VAEX_LOGGING_ERROR= python myscript.py

Python settings vaex.settings.main.logging.error

Progress#

Data configuration

type#

Default progressbar to show: ‘simple’, ‘rich’ or ‘widget’

Environmental variable: VAEX_PROGRESS_TYPE

Example use:

$ VAEX_PROGRESS_TYPE=simple python myscript.py

Python settings vaex.settings.main.progress.type

Example use: vaex.settings.main.progress.type = 'simple'

force#

Force showing a progress bar of this type, even when no progress bar was requested from user code

Environmental variable: VAEX_PROGRESS

Python settings vaex.settings.main.progress.force

Settings#

Configuration options for the FastAPI server

add_example#

Add example dataset

Environmental variable: VAEX_SERVER_ADD_EXAMPLE

Example use:

$ VAEX_SERVER_ADD_EXAMPLE=True python myscript.py

Python settings vaex.settings.server.add_example

Example use: vaex.settings.server.add_example = True

graphql#

Add graphql endpoint

Environmental variable: VAEX_SERVER_GRAPHQL

Example use:

$ VAEX_SERVER_GRAPHQL=False python myscript.py

Python settings vaex.settings.server.graphql

Example use: vaex.settings.server.graphql = False

files#

Mapping of name to path

Environmental variable: VAEX_SERVER_FILES

Python settings vaex.settings.server.files