Indexes
List indexes
GET /indexes
Example
curl http://localhost:8882/indexes
mq.get_indexes()
Response: 200 OK
{
"results": [
{
"index_name": "Book Collection"
},
{
"index_name": "Animal facts"
}
]
}
Create index
By default the settings look like this. Settings can be set as the index is created.
POST /indexes/{index_name}
Create and index with (optional) settings.
This endpoint accepts the application/json
content type.
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
Body Parameters
The settings for the index. The settings are represented as a nested JSON object.
Name | Type | Default value | Description |
---|---|---|---|
index_defaults |
Dictionary | "" |
The index defaults object |
number_of_shards |
Integer | 5 |
The number of shards for the index |
number_of_replicas |
Integer | 1 |
The number of replicas for the index |
Index Defaults Object
The index_defaults
object contains the default settings for the index. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
treat_urls_and_pointers_as_images |
Boolean | "" |
Fetch images from pointers |
model |
String | hf/all_datasets_v4_MiniLM-L6 |
The model to use for the index |
normalize_embeddings |
Boolean | true |
Normalize the embeddings to have unit length |
text_preprocessing |
Dictionary | "" |
The text preprocessing object |
image_preprocessing |
Dictionary | "" |
The image preprocessing object |
ann_parameters |
Dictionary | "" |
The ANN algorithm parameter object |
model_properties |
Dictionary | "" |
The model properties object |
Text Preprocessing Object
The text_preprocessing
object contains the specifics of how you want the index to preprocess text. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
split_length |
Integer | 2 |
The length of the chunks after splitting by split_method |
split_overlap |
Integer | 0 |
The length of overlap between adjacent chunks |
split_method |
String | sentence |
The method by which text is chunked (character , word , sentence , or passage ) |
Image Preprocessing Object
The image_preprocessing
object contains the specifics of how you want the index to preprocess images. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
patch_method |
String | null |
The method by which images are chunked (simple or frcnn ) |
ANN Algorithm Parameter object
The ann_parameters
object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
space_type |
String | cosinesimil |
The function used to measure the distance between two points in ANN (l1 , l2 , linf , or cosinesimil ). |
parameters |
Dict | "" |
The hyperparameters for the ANN method (which is always hnsw for Marqo). |
HNSW Method Parameters Object
parameters
can have the following values:
Name | Type | Default value | Description |
---|---|---|---|
ef_construction |
int | 128 |
The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096) |
m |
int | 16 |
The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100. |
Model Properties Object
model_properties
is a flexible object that is used to set up models that aren't available in Marqo by default (models available by default are listed here).
The structure of model_properties can vary depending on the model. Common fields are listed here.
Name | Type | Default value | Description |
---|---|---|---|
model_location |
Dictionary | "" |
The location of the model if it is not easily reachable by URL (for example a model hosted on a private Hugging Face and AWS S3 repos. See here for examples. |
Below is a sample index settings JSON object. When using the Python client, pass this dictionary as the settings_dict
parameter for the create_index
method.
{
"index_defaults": {
"treat_urls_and_pointers_as_images": false,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": true,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": null
},
"ann_parameters" : {
"space_type": "cosinesimil",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
}
Example
curl -XPOST 'http://localhost:8882/indexes/my-first-index' -H 'Content-type:application/json' -d '
{
"index_defaults": {
"treat_urls_and_pointers_as_images": false,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": true,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": null
},
"ann_parameters" : {
"space_type": "cosinesimil",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
}'
index_settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": False,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": True,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": None
},
"ann_parameters" : {
"space_type": "cosinesimil",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
}
mq.create_index("my-first-index", settings_dict=index_settings)
Delete index
Delete an index.
Note: This operation cannot be undone, and the deleted index can't be recovered
DELETE /indexes/{index_name}
Example
curl -XDELETE http://localhost:8882/indexes/my-first-index
results = mq.index("my-first-index").delete()