Running Marqo on AWS
Size your data
Marqo often requires additional storage due to the fact that it enriches data using embeddings. Therefore, expect at least a 10x increase in data size for text based data.
We recommend using the
g4 instance family for marqo. If you have less than 200GB of data size, a
g4dn.xlarge instance is recommended.
Note that running marqo on a CPU instance such as a
t3 instance is also acceptable for search but will suffer a significant slowdown when used for
add_documents calls compared to GPU.
AWS also advises using a GPU instance for the majority of deep learning tasks as it is faster to train new models on a GPU than a CPU instance. To learn more, you can visit AWS recommended GPU instances and AWS EC2 On-Demand Pricing.
Configuring your AWS EC2 Instance
Create an EC2 instance with the following configuration:
EC2 Instance Configuration
When your EC2 instance is created, connect to the instance (e.g direct connect in the Amazon Console or SSH) and use the following command to install docker:
sudo amazon-linux-extras install docker
Then use the following commands to ensure that docker is running and will automatically start when the instance is restarted:
sudo service docker start sudo systemctl enable docker
Finally, run marqo on the instance with:
docker run --name marqo -it --privileged -p 8882:8882 --gpus all --add-host host.docker.internal:host-gateway marqoai/marqo:latest
Note that the data will be stored on the instance itself. If you remove the container the data will be lost.
If you want to stop marqo run
docker stop marqo.
Then to restart marqo run
docker start marqo.