Running Marqo API and Inference as Separate Containers
Starting from Marqo v2.17, you can deploy the Marqo API and Marqo Inference components in separate Docker containers. This separation allows for scalability, independent resource allocation, and improved performance under high-throughput workloads. You can also horizontally scale the API layer using multiple workers, while maintaining a centralized inference service.
Running Marqo in Inference Mode
To start a Marqo container in Inference Mode, configure the following environment variable:
export MARQO_MODE=INFERENCE
Running Marqo in API Mode
To start a Marqo container in API Mode, set the following environment variables:
export MARQO_MODE=API
export MARQO_REMOTE_INFERENCE_URL=http://<inference-host>:<port>
The MARQO_REMOTE_INFERENCE_URL
should point to the Marqo Inference container's hostname and port (e.g., http://host.docker.internal:8881).
Ensure the API container can reach the inference container via Docker networking or host mode.
Examples
Configuring via Docker Compose
Here's an example Docker Compose configuration to run Marqo API and Inference in separate containers with CUDA support:
services:
# CUDA profile services
marqo-api-cuda:
image: ${MARQO_DOCKER_IMAGE}
container_name: marqo
network_mode: "host"
privileged: true
extra_hosts:
- "host.docker.internal:host-gateway"
restart: always
depends_on:
- marqo-inference-cuda
environment:
- MARQO_MODE=API
- MARQO_API_WORKERS=4
- MARQO_ENABLE_THROTTLING=FALSE
- MARQO_REMOTE_INFERENCE_URL=http://host.docker.internal:8881
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- MARQO_ENABLE_BATCH_APIS=true
- MARQO_INDEX_DEPLOYMENT_LOCK_TIMEOUT=0
- MARQO_MODELS_TO_PRELOAD=[]
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
marqo-inference-cuda:
image: ${MARQO_DOCKER_IMAGE}
container_name: inference
privileged: true
network_mode: "host"
environment:
- MARQO_MODE=INFERENCE
- MARQO_MODELS_TO_PRELOAD=[]
- MARQO_INFERENCE_WORKER_COUNT=1
- MARQO_ENABLE_THROTTLING=FALSE
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- MARQO_MAX_CUDA_MODEL_MEMORY=15
- MARQO_MAX_CPU_MODEL_MEMORY=15
- HF_HUB_ENABLE_HF_TRANSFER=1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
To start the containers:
- Save the file as
docker-compose.yml
; - Export the Marqo image version (or tag);
- Run the Docker Compose command
export MARQO_DOCKER_IMAGE=marqoai/marqo:2.17.1-cloud
docker compose up -d