Configuration#
All settings in Vaex can be configured in a uniform way, based on Pydantic. From a Python runtime, configuration of settings can be done via the vaex.settings
module.
import vaex
vaex.settings.main.thread_count = 10
vaex.settings.display.max_columns = 50
Via environmental variables:
$ VAEX_NUM_THREADS=10 VAEX_DISPLAY_MAX_COLUMNS=50 python myservice.py
Otherwise, values are obtained from a .env
file using dotenv from the current working directory.
VAEX_NUM_THREADS=22
VAEX_CHUNK_SIZE_MIN=2048
Lastly, a global yaml file from $VAEX_PATH_HOME/.vaex/main.yaml
is loaded (with last priority).
thread_count: 33
display:
max_columns: 44
max_rows: 20
If we now run vaex settings yaml
, we see the effective settings as yaml output:
$ VAEX_NUM_THREADS=10 VAEX_DISPLAY_MAX_COLUMNS=50 vaex settings yaml
...
chunk:
size: null
size_min: 2048
size_max: 1048576
display:
max_columns: 50
max_rows: 20
thread_count: 10
...
Developers#
When updating vaex/settings.py
, run the vaex settings watch
to generate this documentation below automatically when saving the file.
Schema#
A JSON schema can be generated using
$ vaex settings schema > vaex-settings.schema.json
Settings#
General settings for vaex
aliases#
Aliases to be used for vaex.open
Environmental variable: VAEX_ALIASES
Python settings vaex.settings.main.aliases
async#
How to run async code in the local executor
Environmental variable: VAEX_ASYNC
Example use:
$ VAEX_ASYNC=nest python myscript.py
Python settings vaex.settings.main.async_
Example use: vaex.settings.main.async_ = 'nest'
home#
Home directory for vaex, which defaults to $HOME/.vaex
, If both $VAEX_HOME
and $HOME
are not defined, the current working directory is used. (Note that this setting cannot be configured from the vaex home directory itself).
Environmental variable: VAEX_HOME
Example use:
$ VAEX_HOME=/home/docs/.vaex python myscript.py
Python settings vaex.settings.main.home
Example use: vaex.settings.main.home = '/home/docs/.vaex'
mmap#
Experimental to turn off, will avoid using memory mapping if set to False
Environmental variable: VAEX_MMAP
Example use:
$ VAEX_MMAP=True python myscript.py
Python settings vaex.settings.main.mmap
Example use: vaex.settings.main.mmap = True
process_count#
Number of processes to use for multiprocessing (e.g. apply), defaults to thread_count setting
Environmental variable: VAEX_PROCESS_COUNT
Example use:
$ VAEX_PROCESS_COUNT=2 python myscript.py
Python settings vaex.settings.main.process_count
Example use: vaex.settings.main.process_count = 2
thread_count#
Number of threads to use for computations, defaults to multiprocessing.cpu_count()
Environmental variable: VAEX_NUM_THREADS
Example use:
$ VAEX_NUM_THREADS=2 python myscript.py
Python settings vaex.settings.main.thread_count
Example use: vaex.settings.main.thread_count = 2
thread_count_io#
Number of threads to use for IO, defaults to thread_count_io + 1
Environmental variable: VAEX_NUM_THREADS_IO
Example use:
$ VAEX_NUM_THREADS_IO=2 python myscript.py
Python settings vaex.settings.main.thread_count_io
Example use: vaex.settings.main.thread_count_io = 2
path_lock#
Directory to store lock files for vaex, which defaults to ${VAEX_HOME}/lock/
, Due to possible race conditions lock files cannot be removed while processes using Vaex are running (on Unix systems).
Environmental variable: VAEX_LOCK
Example use:
$ VAEX_LOCK=/home/docs/.vaex/lock python myscript.py
Python settings vaex.settings.main.path_lock
Example use: vaex.settings.main.path_lock = '/home/docs/.vaex/lock'
Cache#
Setting for caching of computation or task results, see the API for more details.
type#
Type of cache, e.g. ‘memory_infinite’, ‘memory’, ‘disk’, ‘redis’, or a multilevel cache, e.g. ‘memory,disk’
Environmental variable: VAEX_CACHE
Python settings vaex.settings.cache.type
disk_size_limit#
Maximum size for cache on disk, e.g. 10GB, 500MB
Environmental variable: VAEX_CACHE_DISK_SIZE_LIMIT
Example use:
$ VAEX_CACHE_DISK_SIZE_LIMIT=10GB python myscript.py
Python settings vaex.settings.cache.disk_size_limit
Example use: vaex.settings.cache.disk_size_limit = '10GB'
memory_size_limit#
Maximum size for cache in memory, e.g. 1GB, 500MB
Environmental variable: VAEX_CACHE_MEMORY_SIZE_LIMIT
Example use:
$ VAEX_CACHE_MEMORY_SIZE_LIMIT=1GB python myscript.py
Python settings vaex.settings.cache.memory_size_limit
Example use: vaex.settings.cache.memory_size_limit = '1GB'
path#
Storage location for cache results. Defaults to ${VAEX_HOME}/cache
Environmental variable: VAEX_CACHE_PATH
Example use:
$ VAEX_CACHE_PATH=/home/docs/.vaex/cache python myscript.py
Python settings vaex.settings.cache.path
Example use: vaex.settings.cache.path = '/home/docs/.vaex/cache'
Chunk#
Configure how a dataset is broken down in smaller chunks. The executor dynamically adjusts the chunk size based on size_min
and size_max
and the number of threads when size
is not set.
size#
When set, fixes the number of chunks, e.g. do not dynamically adjust between min and max
Environmental variable: VAEX_CHUNK_SIZE
Python settings vaex.settings.main.chunk.size
size_min#
Minimum chunk size
Environmental variable: VAEX_CHUNK_SIZE_MIN
Example use:
$ VAEX_CHUNK_SIZE_MIN=1024 python myscript.py
Python settings vaex.settings.main.chunk.size_min
Example use: vaex.settings.main.chunk.size_min = 1024
size_max#
Maximum chunk size
Environmental variable: VAEX_CHUNK_SIZE_MAX
Example use:
$ VAEX_CHUNK_SIZE_MAX=1048576 python myscript.py
Python settings vaex.settings.main.chunk.size_max
Example use: vaex.settings.main.chunk.size_max = 1048576
Data#
Data configuration
path#
Storage location for data files, like vaex.example(). Defaults to ${VAEX_HOME}/data/
Environmental variable: VAEX_DATA_PATH
Example use:
$ VAEX_DATA_PATH=/home/docs/.vaex/data python myscript.py
Python settings vaex.settings.data.path
Example use: vaex.settings.data.path = '/home/docs/.vaex/data'
Display#
How a dataframe displays
max_columns#
How many column to display when printing out a dataframe
Environmental variable: VAEX_DISPLAY_MAX_COLUMNS
Example use:
$ VAEX_DISPLAY_MAX_COLUMNS=200 python myscript.py
Python settings vaex.settings.display.max_columns
Example use: vaex.settings.display.max_columns = 200
max_rows#
How many rows to print out before showing the first and last rows
Environmental variable: VAEX_DISPLAY_MAX_ROWS
Example use:
$ VAEX_DISPLAY_MAX_ROWS=10 python myscript.py
Python settings vaex.settings.display.max_rows
Example use: vaex.settings.display.max_rows = 10
FileSystem#
Filesystem configuration
path#
Storage location for caching files from remote file systems. Defaults to ${VAEX_HOME}/file-cache/
Environmental variable: VAEX_FS_PATH
Example use:
$ VAEX_FS_PATH=/home/docs/.vaex/file-cache python myscript.py
Python settings vaex.settings.fs.path
Example use: vaex.settings.fs.path = '/home/docs/.vaex/file-cache'
MemoryTracker#
Memory tracking/protection when using vaex in a service
type#
Which memory tracker to use when executing tasks
Environmental variable: VAEX_MEMORY_TRACKER
Example use:
$ VAEX_MEMORY_TRACKER=default python myscript.py
Python settings vaex.settings.main.memory_tracker.type
Example use: vaex.settings.main.memory_tracker.type = 'default'
max#
How much memory the executor can use maximally (only used for type=’limit’)
Environmental variable: VAEX_MEMORY_TRACKER_MAX
Python settings vaex.settings.main.memory_tracker.max
TaskTracker#
task tracking/protection when using vaex in a service
type#
Comma seperated string of trackers to run while executing tasks
Environmental variable: VAEX_TASK_TRACKER
Example use:
$ VAEX_TASK_TRACKER= python myscript.py
Python settings vaex.settings.main.task_tracker.type
Logging#
Configure logging for Vaex. By default Vaex sets up logging, which is useful when running a script. When Vaex is used in applications or services that already configure logging, set the environomental variables VAEX_LOGGING_SETUP to false.
See the API docs for more details.
Note that settings vaex.settings.main.logging.info
etc at runtime, has no direct effect, since logging is already configured. When needed, call vaex.logging.reset()
and vaex.logging.setup()
to reconfigure logging.
setup#
Setup logging for Vaex at import time.
Environmental variable: VAEX_LOGGING_SETUP
Example use:
$ VAEX_LOGGING_SETUP=True python myscript.py
Python settings vaex.settings.main.logging.setup
Example use: vaex.settings.main.logging.setup = True
rich#
Use rich logger (colored fancy output).
Environmental variable: VAEX_LOGGING_RICH
Example use:
$ VAEX_LOGGING_RICH=True python myscript.py
Python settings vaex.settings.main.logging.rich
Example use: vaex.settings.main.logging.rich = True
debug#
Comma seperated list of loggers to set to the debug level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)
Environmental variable: VAEX_LOGGING_DEBUG
Example use:
$ VAEX_LOGGING_DEBUG= python myscript.py
Python settings vaex.settings.main.logging.debug
info#
Comma seperated list of loggers to set to the info level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)
Environmental variable: VAEX_LOGGING_INFO
Example use:
$ VAEX_LOGGING_INFO= python myscript.py
Python settings vaex.settings.main.logging.info
warning#
Comma seperated list of loggers to set to the warning level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)
Environmental variable: VAEX_LOGGING_WARNING
Example use:
$ VAEX_LOGGING_WARNING=vaex python myscript.py
Python settings vaex.settings.main.logging.warning
Example use: vaex.settings.main.logging.warning = 'vaex'
error#
Comma seperated list of loggers to set to the error level (e.g. ‘vaex.settings,vaex.cache’), or a ‘1’ to set the root logger (‘vaex’)
Environmental variable: VAEX_LOGGING_ERROR
Example use:
$ VAEX_LOGGING_ERROR= python myscript.py
Python settings vaex.settings.main.logging.error
Progress#
Data configuration
type#
Default progressbar to show: ‘simple’, ‘rich’ or ‘widget’
Environmental variable: VAEX_PROGRESS_TYPE
Example use:
$ VAEX_PROGRESS_TYPE=simple python myscript.py
Python settings vaex.settings.main.progress.type
Example use: vaex.settings.main.progress.type = 'simple'
force#
Force showing a progress bar of this type, even when no progress bar was requested from user code
Environmental variable: VAEX_PROGRESS
Python settings vaex.settings.main.progress.force
Settings#
Configuration options for the FastAPI server
add_example#
Add example dataset
Environmental variable: VAEX_SERVER_ADD_EXAMPLE
Example use:
$ VAEX_SERVER_ADD_EXAMPLE=True python myscript.py
Python settings vaex.settings.server.add_example
Example use: vaex.settings.server.add_example = True
graphql#
Add graphql endpoint
Environmental variable: VAEX_SERVER_GRAPHQL
Example use:
$ VAEX_SERVER_GRAPHQL=False python myscript.py
Python settings vaex.settings.server.graphql
Example use: vaex.settings.server.graphql = False
files#
Mapping of name to path
Environmental variable: VAEX_SERVER_FILES
Python settings vaex.settings.server.files