Configuring Marqo
Marqo is configured through environment variables passed to the Marqo container when it is run.
Configure Marqo running mode
Configuration Name | Default | Description |
---|---|---|
MARQO_MODE |
COMBINED |
Specifies whether the Marqo container runs in API , INFERENCE , or COMBINED mode. |
MARQO_REMOTE_INFERENCE_URL |
null |
URL for the remote inference container. This is only used when MARQO_MODE is set to API . |
MARQO_API_WORKERS |
1 | Number of Uvicorn workers. Used only when MARQO_MODE is set to API . |
The MARQO_MODE
environment variable determines how Marqo operates. It can be set to one of the following values:
COMBINED
(default): Runs both the API and inference components in a single container.API
: Runs Marqo in API-only mode. A separate inference container must be running.INFERENCE
: Runs Marqo in inference-only mode. A separate API container must be running.
For details on running the API and inference components in separate containers, see Running Marqo API and Inference as Separate Containers. When operating in separate mode, some configuration options apply only to the API container or only to the inference container.
Configuring log level and format (all modes)
Configuration Name | Default | Description |
---|---|---|
MARQO_LOG_LEVEL |
info |
Log level: one of error , warning , info , debug . |
MARQO_LOG_FORMAT |
plain |
Log format: either plain or json |
MARQO_LOG_LEVEL
adjusts the verbosity of Marqo’s logs.- A higher level (e.g.,
error
) reduces log volume. - A lower level (e.g.,
debug
) produces more detailed output. - The default is
info
, which is recommended for production. Enablingdebug
may degrade performance. - Uvicorn’s access logs always remain at
info
level, regardless of theMARQO_LOG_LEVEL
setting. MARQO_LOG_FORMAT
determines the output format.- Use
plain
for human-readable logs orjson
for structured logging.
Example
docker run --name marqo -p 8882:8882 \
-e MARQO_LOG_LEVEL='warning' -e MARQO_LOG_FORMAT='json' \
marqoai/marqo:latest
Configuring usage limits (API or COMBINED)
Limits can be set to protect the resources of the machine Marqo is running on.
Configuration name | Default | Description |
---|---|---|
MARQO_MAX_DOC_BYTES |
100000 | Maximum document size allowed to be indexed |
MARQO_MAX_RETRIEVABLE_DOCS |
10000 | Maximum number of documents allowed to be returned in a single request. The maximum value this can be set to is 10000. |
MARQO_MAX_CUDA_MODEL_MEMORY |
4 | Maximum CUDA memory usage (GB) for models in Marqo. For multi-GPU, this is the max memory for each GPU. |
MARQO_MAX_CPU_MODEL_MEMORY |
4 | Maximum RAM usage (GB) for models in Marqo. |
MARQO_MAX_VECTORISE_BATCH_SIZE |
16 | Maximum size of batch size to process in parallel (when, for example, adding documents ). |
MARQO_MAX_SEARCH_VIDEO_AUDIO_FILE_SIZE |
387973120 | Maximum size of video or audio file to be searched in a single request in bytes. |
MARQO_MAX_ADD_DOCS_VIDEO_AUDIO_FILE_SIZE |
387973120 | Maximum size of video or audio file to be added to an index in bytes. |
MARQO_MAX_DOCUMENTS_BATCH_SIZE |
128 | Maximum number of documents that can be added or updated in a single request. |
MARQO_MAX_DELETE_DOCS_COUNT |
10000 | Maximum number of documents that can be deleted in a single request. |
VESPA_POOL_SIZE |
10 | The size of the connection pool for Vespa operations including search. This should be set to a value at least as large as MARQO_MAX_CONCURRENT_SEARCH . |
VESPA_FEED_POOL_SIZE |
10 | Maximum Vespa feed concurrency per indexing batch. |
VESPA_GET_POOL_SIZE |
10 | Maximum Vespa get concurrency per request when retrieving documents by ID. |
VESPA_DELETE_POOL_SIZE |
10 | Maximum Vespa delete concurrency per request. |
VESPA_PARTIAL_UPDATE_POOL_SIZE |
10 | Maximum Vespa update concurrency per request. |
VESPA_SEARCH_TIMEOUT_MS |
1000 | Amount of time before search request to Vespa times out (milliseconds). |
Example
docker run --name marqo -p 8882:8882 \
-e "MARQO_MAX_DOC_BYTES=200000" \
-e "MARQO_MAX_RETRIEVABLE_DOCS=600" \
-e "MARQO_MAX_CUDA_MODEL_MEMORY=5" \
-e "VESPA_SEARCH_TIMEOUT_MS=2000" marqoai/marqo:latest
In the above example a marqo container is being run with the following limits: - Max size of an indexed document is 200KB - Max number of documents allowed to be returned in a single request is 600 - Max CUDA memory usage for models in Marqo is 5GB. - Vespa search timeout is 2 seconds.
Configure backend communication (API or COMBINED)
This section describes the environment variables that can be used to configure Marqo's communication with the backend. It can be helpful to set these variables when Marqo is running in a container and needs to communicate with a Vespa running on a separate container or a difference host machine.
Note: Regularly upgrade Vespa when hosting it yourself. New releases of Marqo leverage features and bug fixes introduced in the latest versions of Vespa. If you are running Marqo 2.13.0, please upgrade Vespa to version 8.396.18 or later. This helps prevent potential issues, such as long response times when adding documents to an unstructured index or unexpected behavior during Marqo upgrades. For more details or if you encounter any issues, please refer to the Troubleshooting Guide.
Configuration name | Default | Description |
---|---|---|
VESPA_CONFIG_URL |
"http://localhost:19071" |
URL for Vespa configuration. |
VESPA_QUERY_URL |
"http://localhost:8080" |
URL for querying the Vespa instance. |
VESPA_DOCUMENT_URL |
"http://localhost:8080" |
URL for document operations in the Vespa instance. |
VESPA_CONTENT_CLUSTER_NAME |
"content_default" |
Name of the Vespa content cluster. |
ZOOKEEPER_HOSTS |
null |
Hosts for the Zookeeper server, no "https" or "http" required in the string. If not set, Marqo will skip the connection to the Zookeeper server. |
Example: Running Marqo on a standalone Vespa container
In this example, we will start a Vespa container, initialise it with an application package, and run Marqo container on that Vespa container.
Step 1: Initialize Vespa Container Environment
Start a Vespa container using the latest Vespa image. Make sure to expose the necessary ports for the config server, container server, and Zookeeper.
docker run --detach --name vespa -p 8080:8080 -p 19071:19071 -p 2181:2181 vespaengine/vespa:8
Step 2: Deploy an Application Package to Configure Vespa
Clone the Marqo repository and deploy an application package for local runs. This setup ensures that the vector store is configured correctly.
git clone https://github.com/marqo-ai/marqo.git
cd marqo/scripts/vespa_local
zip -r - * | curl --header "Content-Type:application/zip" --data-binary @- http://localhost:19071/application/v2/tenant/default/prepareandactivate
You can verify that the vector store has been set up correctly by visiting http://localhost:8080
in your browser.
The vector store can take a few minutes to start responding after the initial configuration.
Step 3: Launch Marqo with Vespa Configuration
With your external vector store ready, you can now run Marqo configured to use it:
docker run --name marqo -p 8882:8882 --add-host host.docker.internal:host-gateway \
-e VESPA_CONFIG_URL="http://host.docker.internal:19071" \
-e VESPA_DOCUMENT_URL="http://host.docker.internal:8080" \
-e VESPA_QUERY_URL="http://host.docker.internal:8080" \
-e ZOOKEEPER_HOSTS="host.docker.internal:2181" \
marqoai/marqo:latest
Enhancing Your Vespa Setup with Kubernetes
For a more robust and scalable setup, follow the instructions provided in marqo-on-kubernetes Github repo. This guide offers detailed steps for setting up a Vespa cluster using Kubernetes across various cloud providers.
Configuring media download threads (API or COMBINED)
Marqo provides environment variables and parameters to control the number of threads used for downloading media during processing.
Configuration name | Default | Description |
---|---|---|
MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST |
5 | Maximum number of threads to download media in parallel. |
MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST |
20 | (Deprecated) Maximum number of threads to download images. |
Thread Count Determination
Marqo determines the number of threads for media downloads in the following order of priority:
- If
media_download_thread_count
is set in the add_documents parameters and is different from the default, this value is used. - If the
MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST
environment variable is explicitly set and is different from the default, this value is used. - If the model type is
languagebind
, the thread count is set to 5. - If
image_download_thread_count
is explicitly set in the add_documents parameters and is different from the default, this value is used. - If the
MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST
environment variable is explicitly set and is different from the default, this value is used. - If none of the above conditions are met, the default value (for
MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST
) is used.
MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST
- This is the preferred parameter for controlling media download threads. It applies to all types of media, including images, videos, and audio files.
MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST
- This environment variable is deprecated and will be removed in future versions. It's maintained for backward compatibility but only affects image downloads.
Configuring throttling (API or COMBINED)
Configuration name | Default | Description |
---|---|---|
MARQO_ENABLE_THROTTLING |
"TRUE" |
Adds throttling if "TRUE" . Must be a str : Either "TRUE" or "FALSE" . |
MARQO_MAX_CONCURRENT_INDEX |
8 | Maximum allowed concurrent indexing threads |
MARQO_MAX_CONCURRENT_SEARCH |
8 | Maximum allowed concurrent search threads |
MARQO_MAX_CONCURRENT_PARTIAL_UPDATE |
100 | Maximum allowed concurrent partial update threads |
These environment variables set Marqo's allowed concurrency across index and search. If these limits are reached, then
Marqo will return 429
on subsequent requests. These should be set with respect to available resources of the machine
Marqo will be running on.
Example
docker run --name marqo -p 8882:8882 \
-e MARQO_ENABLE_THROTTLING='TRUE' \
-e MARQO_MAX_CONCURRENT_SEARCH='10' \
marqoai/marqo:latest
Advanced configuration (API or COMBINED)
These are additional advanced configurations that can be set to customize Marqo's behavior. Most users will not need to change these values.
Configuration name | Default | Description |
---|---|---|
MARQO_DEFAULT_EF_SEARCH |
2000 | Default HNSW efSearch value |
MARQO_MAX_SEARCHABLE_TENSOR_ATTRIBUTES |
null |
The maximum allowed number of tensor fields to be searched in a single tensor search query. By default, there is no limit |
MARQO_MAX_SEARCH_LIMIT |
1000 | The maximum allowed limit for search requests. This can be set up to 1000000 . |
MARQO_MAX_SEARCH_OFFSET |
10000 | The maximum allowed offset for search requests. This can be set up to 1000000 . |
MARQO_MAX_TENSOR_FIELD_COUNT_UNSTRUCTURED |
100 | The maximum allowed number of tensor fields to be added to a unstructured index created with Marqo 2.13.0+ |
MARQO_MAX_LEXICAL_FIELD_COUNT_UNSTRUCTURED |
100 | The maximum allowed number of lexical fields to be added to a unstructured index created with Marqo 2.13.0+ |
MARQO_MAX_STRING_ARRAY_FIELD_COUNT_UNSTRUCTURED |
100 | The maximum allowed number of string array fields to be added to a unstructured index created with Marqo 2.16.0+ |
MARQO_THREAD_EXPIRY_TIME |
1800 | When throttling is enabled, this is the time in seconds after which a request thread's slot is automatically freed up |
MARQO_ROOT_PATH |
null |
Disk path where Marqo stores runtime artifacts such as downloaded models |
ZOOKEEPER_CONNECTION_TIMEOUT |
null |
Connection timeout when connecting to Zookeeper |
VESPA_DISK_USAGE_LIMIT |
0.75 | Disk usage limit for the embedded Vespa (range [0, 1]). This variable will not apply to an external Vespa and should only be used in a development environment. Increasing Vespa disk usage limit can lead to permanent data loss. |
Marqo API inference cache configuration (API mode only)
Configuration name | Default | Description |
---|---|---|
MARQO_API_INFERENCE_CACHE_SIZE |
0 (disabled) | Maximum number of query-embedding pairs to store. Set to a positive integer to enable caching |
MARQO_API_INFERENCE_CACHE_TYPE |
"LRU" (least recently used) |
Eviction policy: Supported types are "LRU" (least recently used), "LFU" (least frequently used) |
These variables control Marqo’s inference cache in API mode. Caching is disabled by default. Each API worker has an independent cache; keep the size modest to limit memory usage. Note that caching does not apply to the add_documents
endpoint.
Example
docker run --name marqo -p 8882:8882 \
-e "MARQO_API_INFERENCE_CACHE_SIZE=5000" \
-e "MARQO_API_INFERENCE_CACHE_TYPE=LFU" \
-e "MARQO_MODE=API" \
-e "MARQO_REMOTE_INFERENCE_URL=http://<inference-host>:<port>" \
marqoai/marqo:latest
Marqo inference cache configuration in Inference (INFERENCE or COMBINED)
Configuration name | Default | Description |
---|---|---|
MARQO_INFERENCE_CACHE_SIZE |
0 (disabled) | Maximum number of query–embedding pairs to store. Set to a positive integer to enable caching |
MARQO_INFERENCE_CACHE_TYPE |
"LRU" (least recently used) |
Eviction policy: "LRU" (least recently used) or "LFU" (least frequently used) |
This cache improves inference latency by storing recent query results. Only one cache exists in INFERENCE
mode, so it can be larger than the API-side cache. Caching does not apply to the add_documents endpoint.
Example
docker run --name marqo -p 8882:8882 \
-e "MARQO_INFERENCE_CACHE_SIZE=20000" \
-e "MARQO_INFERENCE_CACHE_TYPE=LFU" \
-e "MARQO_MODE=INFERENCE" \
marqoai/marqo:latest
Configuring preloaded models (INFERENCE or COMBINED)
-
Variable:
MARQO_MODELS_TO_PRELOAD
-
Default value:
'[]'
-
Expected value: A JSON-encoded array of strings or objects.
This is a list of models to load and pre-warm as Marqo starts. This prevents a delay during initial search and index commands in actual Marqo usage.
Models in string form must be names of models within the model registry. You can find these models here
Models in object form must have model
and modelProperties
keys.
Model Object Example (OPEN CLIP model)
'{
"model": "my-open-clip-1",
"modelProperties": {
"name": "ViT-B-32-quickgelu",
"dimensions": 512,
"url": "https://github.com/mlfoundations/open_clip/releases/download/v0.2-weights/vit_b_32-quickgelu-laion400m_avg-8a00ab3c.pt",
"type": "open_clip"
}
}'
Marqo Run Example (containing both string and object)
export MY_MODEL_LIST='[
"sentence-transformers/stsb-xlm-r-multilingual",
"hf/e5-base-v2",
{
"model": "generic-clip-test-model-2",
"modelProperties": {
"name": "ViT-B/32",
"dimensions": 512,
"type": "clip",
"url": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"
}
}
]'
docker run --name marqo -p 8882:8882 \
-e MARQO_MODELS_TO_PRELOAD="$MY_MODEL_LIST" \
marqoai/marqo:latest
Marqo Video GPU Acceleration Configuration (INFERENCE or COMBINED)
Configuration Name | Default | Description |
---|---|---|
MARQO_ENABLE_VIDEO_GPU_ACCELERATION |
None | Controls whether GPU acceleration is enabled for video decoding. Accepted values are TRUE or FALSE . |
The environment variable MARQO_ENABLE_VIDEO_GPU_ACCELERATION
determines whether Marqo uses GPU acceleration for video decoding.
- Default Behavior: If this variable is not set, Marqo automatically decides based on the availability of a GPU on the host machine.
- Set to
TRUE
: Forces Marqo to use GPU acceleration for video decoding. An error will be raised if GPU acceleration is not available. - Set to
FALSE
: Disables GPU acceleration for video decoding, ensuring CPU-based decoding is used.
Note: In addition to a compatible GPU, the NVIDIA drivers on the host must be version 550.54.14
or newer for GPU acceleration to function properly.
Example Usage
To enable GPU acceleration for video decoding, run the following Docker command:
docker run --name marqo --gpus all -p 8882:8882 \
-e "MARQO_ENABLE_VIDEO_GPU_ACCELERATION=TRUE" \
marqoai/marqo:latest
Third party environment variables (INFERENCE or COMBINED)
The following environment variables are managed by dependencies of Marqo rather than Marqo itself. They are intended for advanced users and should be configured with caution. These variables may be modified or deprecated in future Marqo versions.
Configuration Name | Default Value | Description |
---|---|---|
HF_HUB_ENABLE_HF_TRANSFER |
null |
Set this to 1 to enable faster downloads from Hugging Face on high-bandwidth networks. See the documentation for details. |
HF_HUB_OFFLINE |
null |
Set this to 1 to skip HTTP requests when loading a Hugging Face model. This can be useful if you want to run Marqo in offline mode. Refer to the documentation for more details. |