Update Existing Documents
Update an array of documents in a given index.
Each document must contain an _id
field to identify the document to update.
Only document fields and values you want to update should be in the request.
This endpoint only works for existing documents in structured indexes right now.
Use this endpoint to update documents in a structured index or an unstructured index created after Marqo 2.16,
by modifying the existing fields or adding new fields to the document.
You can only modify or add fields that are not tensor fields or the dependent fields of a
multimodal combination field. If the document does not exist, please check the
add_documents endpoint.
If you need to update a tensor field, a multimodal combination dependent field, or a
document in unstructured index created before Marqo 2.16, check the useExistingTensors
feature.
Unstructured indexes created before Marqo 2.16 do not support this endpoint.
This endpoint accepts the application/json
content type.
PATCH /indexes/{index_name}/documents
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index (structured index only) |
Body
In the RestAPI and for cURL
users these parameters are in lowerCamelCase
, as presented in the following table. The
Python client uses the pythonic snake_case
equivalents.
Add documents parameters | Value Type | Default Value | Description |
---|---|---|---|
documents |
Array of objects | n/a | An array of documents. Each document is represented as a JSON object. Each document must contain a valid _id field to specify the target document. You only need to include the fields you want to update in the JSON object. You cannot update a tensor field in a structured index. |
Response
The response of the update_documens
endpoint in Marqo operates on two levels.
Firstly, a status code of 200
in the overall response indicates that the batch request has been successfully received and processed by Marqo.
The response has the following fields:
Field Name | Type | Description |
---|---|---|
errors |
Boolean | Indicates whether any errors occurred during the processing of the batch request. |
items |
Array | An array of objects, each representing the processing status of an individual document in the batch. |
processingTimeMs |
Integer | The time taken to process the batch request, in milliseconds. |
index_name |
String | The name of the index to which the documents were added. |
However, a 200
status does not necessarily imply that each individual document within the batch was processed without
issues.
For each document in the batch, there will be an associated response code that specifies the status of that particular
document's processing.
These individual response codes provide granular feedback,
allowing users to discern which documents were successfully processed, which encountered errors, and the nature of any
issues encountered.
Each item in the items
array has the following fields:
Field Name | Type | Description |
---|---|---|
_id |
String | The ID of the document that was processed. |
status |
Integer | The status code of the document processing. |
message |
String | A message that provides additional information about the processing status of the document. This field only exists when the status is not 200 . |
Here is the HTTP status code of the individual document responses (non-exhaustive list of status codes):
Status Code | Description |
---|---|
200 |
The document is updated successfully. |
400 |
Bad request. Returned for invalid input (e.g., invalid field types). Inspect message for details. |
404 |
The target document is not in the index. |
429 |
Marqo vector store receives too many requests. Please try again later. |
500 |
Internal error. |
Update behavior for unstructured indexes
Note
Unstructured indexes created after Marqo 2.16 support the update_documents endpoint. However, to optimize performance, this endpoint returns a 400 Bad Request status code for the individual document if there is an issue updating the target document, without detailed diagnostics. You may receive a 400 status code for any of the following reasons:
- The document ID specified in the request does not exist in the index (verify and correct the
_id
field). - The request attempts to change the data type of an existing field (ensure consistent field types across updates).
- The request tries to update a tensor field, a multimodal combination field, or a dependent field of a multimodal combination field (these fields cannot be updated via this endpoint as they contain tensors).
In one specific retriable case, a 400 status code may be returned even if the request is valid.
This can happen when another add_documents
operation, or an update_documents
operation involving map-type fields,
is concurrently modifying the same document ID.
The error is transient and retrying the request after a short delay will typically resolve the issue.
Example
# Let's create a structured index an add a document to it
curl -X POST 'http://localhost:8882/indexes/my-first-structured-index' \
-H "Content-Type: application/json" \
-d '{
"type": "structured",
"allFields": [
{"name": "img", "type": "image_pointer"},
{"name": "title", "type": "text"},
{"name": "label", "type": "text", "features": ["filter"]}
],
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"tensorFields": ["img", "title"]
}'
curl -X POST 'http://localhost:8882/indexes/my-first-structured-index/documents' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image0.jpg?raw=true",
"title": "A lady taking a phote",
"label": "lady",
"_id": "1"
},
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image1.jpg?raw=true",
"title": "A plane flying in the sky",
"label": "airplane",
"_id": "2"
}
]
}'
# Now let's update the document by changing the label
curl -X PATCH 'http://localhost:8882/indexes/my-first-structured-index/documents' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"label": "person",
"_id": "1"
},
{
"_id": "2",
"label": "plane"
}
]
}'
# Let's create a structured index an add a document to it
mq.create_index(
"my-first-structured-index",
type="structured",
all_fields=[
{"name": "img", "type": "image_pointer"},
{"name": "title", "type": "text"},
{"name": "label", "type": "text", "features": ["filter"]},
],
model="open_clip/ViT-B-32/laion2b_s34b_b79k",
tensor_fields=["img", "title"],
)
mq.index("my-first-structured-index").add_documents(
[
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image0.jpg?raw=true",
"title": "A lady taking a phote",
"label": "lady",
"_id": "1",
},
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image1.jpg?raw=true",
"title": "A plane flying in the sky",
"label": "airplane",
"_id": "2",
},
]
)
# Now let's update the document by changing the label
mq.index("my-first-structured-index").update_documents(
[{"_id": "1", "label": "person"}, {"_id": "2", "label": "plane"}]
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
# Let's create a structured index an add a document to it
curl -X POST 'https://api.marqo.ai/api/v2/indexes/my-first-structured-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"type": "structured",
"allFields": [
{"name": "img", "type": "image_pointer"},
{"name": "title", "type": "text"},
{"name": "label", "type": "text", "features": ["filter"]}
],
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"tensorFields": ["img", "title"]
}'
curl -X POST your_endpoint/indexes/my-first-structured-index/documents' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image0.jpg?raw=true",
"title": "A lady taking a phote",
"label": "lady",
"_id": "1"
},
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image1.jpg?raw=true",
"title": "A plane flying in the sky",
"label": "airplane",
"_id": "2"
}
]
}'
# Now let's update the document by changing the label
curl -X PATCH 'your_endpoint/indexes/my-first-structured-index/documents' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"label": "person",
"_id": "1"
},
{
"_id": "2",
"label": "plane"
}
]
}'
# Let's create a structured index an add a document to it
mq.create_index(
"my-first-structured-index",
type="structured",
all_fields=[
{"name": "img", "type": "image_pointer"},
{"name": "title", "type": "text"},
{"name": "label", "type": "text", "features": ["filter"]},
],
model="open_clip/ViT-B-32/laion2b_s34b_b79k",
tensor_fields=["img", "title"],
)
mq.index("my-first-structured-index").add_documents(
[
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image0.jpg?raw=true",
"title": "A lady taking a phote",
"label": "lady",
"_id": "1",
},
{
"img": "https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchGuide/data/image1.jpg?raw=true",
"title": "A plane flying in the sky",
"label": "airplane",
"_id": "2",
},
]
)
# Now let's update the document by changing the label
mq.index("my-first-structured-index").update_documents(
[{"_id": "1", "label": "person"}, {"_id": "2", "label": "plane"}]
)
Response 200 OK
{
'errors': false,
'index_name': 'my-first-structured-index',
'items': [
{
'_id': '1',
'status': 200
},
{
'_id': '2',
'status': 200
}
],
'processingTimeMs': 20.17
}
The update document endpoint is only available for structured indexes to update the fields of existing documents.
In the example, we updated the label of the documents with _id
fields "1"
and "2"
.
The response shows that the update was successful. These changes are reflected in the index and can be used for search and
filtering. Note that you can only update fields that are not tensor fields.
Documents
Parameter: documents
Expected value: Array of documents (default maximum length: 128). Each document is a JSON object that must
contain a valid _id
field to specify the target document. You only need to include the fields you want to update in
the JSON object.
[
{
"Title": "You updated title 1 ",
"Description": "You updated description 1 ",
"_id": "your-target-doc-id-1"
},
{
"Title": "You updated title 2 ",
"Description": "You updated description 2",
"_id": "your-target-doc-id-2"
}
]