Configuration

All settings in Vaex can be configured in a uniform way, based on Pydantic. From a Python runtime, configuration of settings can be done via the vaex.settings module.

import vaex
vaex.settings.main.thread_count = 10
vaex.settings.display.max_columns = 50

Via environmental variables:

$ VAEX_NUM_THREADS=10 VAEX_DISPLAY_MAX_COLUMNS=50 python myservice.py

Otherwise, values are obtained from a .env file using dotenv from the current working directory.

VAEX_NUM_THREADS=22
VAEX_CHUNK_SIZE_MIN=2048

Lastly, a global yaml file from $VAEX_PATH_HOME/.vaex/main.yaml is loaded (with last priority).

thread_count: 33
display:
  max_columns: 44
  max_rows: 20

If we now run vaex settings yaml, we see the effective settings as yaml output:

$ VAEX_NUM_THREADS=10 VAEX_DISPLAY_MAX_COLUMNS=50 vaex settings yaml
...
chunk:
  size: null
  size_min: 2048
  size_max: 1048576
display:
  max_columns: 50
  max_rows: 20
thread_count: 10
...

Developers

When updating vaex/settings.py, run the vaex settings watch to generate this documentation below automatically when saving the file.

Schema

A JSON schema can be generated using

$ vaex settings schema > vaex-settings.schema.json

Settings

General settings for vaex

aliases

Aliases to be used for vaex.open

Environmental variable: VAEX_ALIASES

Python settings vaex.settings.main.aliases

async

How to run async code in the local executor

Environmental variable: VAEX_ASYNC

Example use:

$ VAEX_ASYNC=nest python myscript.py

Python settings vaex.settings.main.async_

Example use: vaex.settings.main.async_ = 'nest'

home

Home directory for vaex, which defaults to $HOME/.vaex, If both $VAEX_HOME and $HOME are not defined, the current working directory is used. (Note that this setting cannot be configured from the vaex home directory itself).

Environmental variable: VAEX_HOME

Example use:

$ VAEX_HOME=/home/docs/.vaex python myscript.py

Python settings vaex.settings.main.home

Example use: vaex.settings.main.home = '/home/docs/.vaex'

mmap

Experimental to turn off, will avoid using memory mapping if set to False

Environmental variable: VAEX_MMAP

Example use:

$ VAEX_MMAP=True python myscript.py

Python settings vaex.settings.main.mmap

Example use: vaex.settings.main.mmap = True

process_count

Number of processes to use for multiprocessing (e.g. apply), defaults to thread_count setting

Environmental variable: VAEX_PROCESS_COUNT

Example use:

$ VAEX_PROCESS_COUNT=2 python myscript.py

Python settings vaex.settings.main.process_count

Example use: vaex.settings.main.process_count = 2

thread_count

Number of threads to use for computations, defaults to multiprocessing.cpu_count()

Environmental variable: VAEX_NUM_THREADS

Example use:

$ VAEX_NUM_THREADS=2 python myscript.py

Python settings vaex.settings.main.thread_count

Example use: vaex.settings.main.thread_count = 2

thread_count_io

Number of threads to use for IO, defaults to thread_count_io + 1

Environmental variable: VAEX_NUM_THREADS_IO

Example use:

$ VAEX_NUM_THREADS_IO=2 python myscript.py

Python settings vaex.settings.main.thread_count_io

Example use: vaex.settings.main.thread_count_io = 2

path_lock

Directory to store lock files for vaex, which defaults to ${VAEX_HOME}/lock/, Due to possible race conditions lock files cannot be removed while processes using Vaex are running (on Unix systems).

Environmental variable: VAEX_LOCK

Example use:

$ VAEX_LOCK=/home/docs/.vaex/lock python myscript.py

Python settings vaex.settings.main.path_lock

Example use: vaex.settings.main.path_lock = '/home/docs/.vaex/lock'

Cache

Setting for caching of computation or task results, see the API for more details.

type

Type of cache, e.g. ‘memory_infinite’, ‘memory’, ‘disk’, ‘redis’, or a multilevel cache, e.g. ‘memory,disk’

Environmental variable: VAEX_CACHE

Python settings vaex.settings.cache.type

disk_size_limit

Maximum size for cache on disk, e.g. 10GB, 500MB

Environmental variable: VAEX_CACHE_DISK_SIZE_LIMIT

Example use:

$ VAEX_CACHE_DISK_SIZE_LIMIT=10GB python myscript.py

Python settings vaex.settings.cache.disk_size_limit

Example use: vaex.settings.cache.disk_size_limit = '10GB'

memory_size_limit

Maximum size for cache in memory, e.g. 1GB, 500MB

Environmental variable: VAEX_CACHE_MEMORY_SIZE_LIMIT

Example use:

$ VAEX_CACHE_MEMORY_SIZE_LIMIT=1GB python myscript.py

Python settings vaex.settings.cache.memory_size_limit

Example use: vaex.settings.cache.memory_size_limit = '1GB'

path

Storage location for cache results. Defaults to ${VAEX_HOME}/cache

Environmental variable: VAEX_CACHE_PATH

Example use:

$ VAEX_CACHE_PATH=/home/docs/.vaex/cache python myscript.py

Python settings vaex.settings.cache.path

Example use: vaex.settings.cache.path = '/home/docs/.vaex/cache'

Chunk

Configure how a dataset is broken down in smaller chunks. The executor dynamically adjusts the chunk size based on size_min and size_max and the number of threads when size is not set.

size

When set, fixes the number of chunks, e.g. do not dynamically adjust between min and max

Environmental variable: VAEX_CHUNK_SIZE

Python settings vaex.settings.main.chunk.size

size_min

Minimum chunk size

Environmental variable: VAEX_CHUNK_SIZE_MIN

Example use:

$ VAEX_CHUNK_SIZE_MIN=1024 python myscript.py

Python settings vaex.settings.main.chunk.size_min

Example use: vaex.settings.main.chunk.size_min = 1024

size_max

Maximum chunk size

Environmental variable: VAEX_CHUNK_SIZE_MAX

Example use:

$ VAEX_CHUNK_SIZE_MAX=1048576 python myscript.py

Python settings vaex.settings.main.chunk.size_max

Example use: vaex.settings.main.chunk.size_max = 1048576

Data

Data configuration

path

Storage location for data files, like vaex.example(). Defaults to ${VAEX_HOME}/data/

Environmental variable: VAEX_DATA_PATH

Example use:

$ VAEX_DATA_PATH=/home/docs/.vaex/data python myscript.py

Python settings vaex.settings.data.path

Example use: vaex.settings.data.path = '/home/docs/.vaex/data'

Display

How a dataframe displays

max_columns

How many column to display when printing out a dataframe

Environmental variable: VAEX_DISPLAY_MAX_COLUMNS

Example use:

$ VAEX_DISPLAY_MAX_COLUMNS=200 python myscript.py

Python settings vaex.settings.display.max_columns

Example use: vaex.settings.display.max_columns = 200

max_rows

How many rows to print out before showing the first and last rows

Environmental variable: VAEX_DISPLAY_MAX_ROWS

Example use:

$ VAEX_DISPLAY_MAX_ROWS=10 python myscript.py

Python settings vaex.settings.display.max_rows

Example use: vaex.settings.display.max_rows = 10

FileSystem

Filesystem configuration

path

Storage location for caching files from remote file systems. Defaults to ${VAEX_HOME}/file-cache/

Environmental variable: VAEX_FS_PATH

Example use:

$ VAEX_FS_PATH=/home/docs/.vaex/file-cache python myscript.py

Python settings vaex.settings.fs.path

Example use: vaex.settings.fs.path = '/home/docs/.vaex/file-cache'

MemoryTracker

Memory tracking/protection when using vaex in a service

type

Which memory tracker to use when executing tasks

Environmental variable: VAEX_MEMORY_TRACKER

Example use:

$ VAEX_MEMORY_TRACKER=default python myscript.py

Python settings vaex.settings.main.memory_tracker.type

Example use: vaex.settings.main.memory_tracker.type = 'default'

max

How much memory the executor can use maximally (only used for type=’limit’)

Environmental variable: VAEX_MEMORY_TRACKER_MAX

Python settings vaex.settings.main.memory_tracker.max

TaskTracker

task tracking/protection when using vaex in a service

type

Comma seperated string of trackers to run while executing tasks

Environmental variable: VAEX_TASK_TRACKER

Example use:

$ VAEX_TASK_TRACKER= python myscript.py

Python settings vaex.settings.main.task_tracker.type

Logging

Configure logging for Vaex. By default Vaex sets up logging, which is useful when running a script. When Vaex is used in applications or services that already configure logging, set the environomental variables VAEX_LOGGING_SETUP to false.

See the API docs for more details.

Note that settings vaex.settings.main.logging.info etc at runtime, has no direct effect, since logging is already configured. When needed, call vaex.logging.reset() and vaex.logging.setup() to reconfigure logging.

setup

Setup logging for Vaex at import time.

Environmental variable: VAEX_LOGGING_SETUP

Example use:

$ VAEX_LOGGING_SETUP=True python myscript.py

Python settings vaex.settings.main.logging.setup

Example use: vaex.settings.main.logging.setup = True

rich

Use rich logger (colored fancy output).

Environmental variable: VAEX_LOGGING_RICH

Example use:

$ VAEX_LOGGING_RICH=True python myscript.py

Python settings vaex.settings.main.logging.rich

Example use: vaex.settings.main.logging.rich = True

debug

Comma seperated list of loggers to set to the debug level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_DEBUG

Example use:

$ VAEX_LOGGING_DEBUG= python myscript.py

Python settings vaex.settings.main.logging.debug

info

Comma seperated list of loggers to set to the info level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_INFO

Example use:

$ VAEX_LOGGING_INFO= python myscript.py

Python settings vaex.settings.main.logging.info

warning

Comma seperated list of loggers to set to the warning level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_WARNING

Example use:

$ VAEX_LOGGING_WARNING=vaex python myscript.py

Python settings vaex.settings.main.logging.warning

Example use: vaex.settings.main.logging.warning = 'vaex'

error

Comma seperated list of loggers to set to the error level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)

Environmental variable: VAEX_LOGGING_ERROR

Example use:

$ VAEX_LOGGING_ERROR= python myscript.py

Python settings vaex.settings.main.logging.error

Progress

Data configuration

type

Default progressbar to show: ‘simple’, ‘rich’ or ‘widget’

Environmental variable: VAEX_PROGRESS_TYPE

Example use:

$ VAEX_PROGRESS_TYPE=simple python myscript.py

Python settings vaex.settings.main.progress.type

Example use: vaex.settings.main.progress.type = 'simple'

force

Force showing a progress bar of this type, even when no progress bar was requested from user code

Environmental variable: VAEX_PROGRESS

Python settings vaex.settings.main.progress.force

Settings

Configuration options for the FastAPI server

add_example

Add example dataset

Environmental variable: VAEX_SERVER_ADD_EXAMPLE

Example use:

$ VAEX_SERVER_ADD_EXAMPLE=True python myscript.py

Python settings vaex.settings.server.add_example

Example use: vaex.settings.server.add_example = True

graphql

Add graphql endpoint

Environmental variable: VAEX_SERVER_GRAPHQL

Example use:

$ VAEX_SERVER_GRAPHQL=False python myscript.py

Python settings vaex.settings.server.graphql

Example use: vaex.settings.server.graphql = False

files

Mapping of name to path

Environmental variable: VAEX_SERVER_FILES

Python settings vaex.settings.server.files