curl --request POST \
--url https://api.trieve.ai/api/dataset/batch_create_datasets \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Organization: <tr-organization>' \
--data '{
"datasets": [
{
"dataset_name": "<string>",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "<string>"
}
],
"upsert": true
}'
[
{
"created_at": "2021-01-01 00:00:00.000",
"id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"name": "Trieve",
"organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "foobar-dataset",
"updated_at": "2021-01-01 00:00:00.000"
}
]
Datasets will be created in the org specified via the TR-Organization header. Auth’ed user must be an owner of the organization to create datasets. If a tracking_id is ignored due to it already existing on the org, the response will not contain a dataset with that tracking_id and it can be assumed that a dataset with the missing tracking_id already exists.
curl --request POST \
--url https://api.trieve.ai/api/dataset/batch_create_datasets \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Organization: <tr-organization>' \
--data '{
"datasets": [
{
"dataset_name": "<string>",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "<string>"
}
],
"upsert": true
}'
[
{
"created_at": "2021-01-01 00:00:00.000",
"id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"name": "Trieve",
"organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "foobar-dataset",
"updated_at": "2021-01-01 00:00:00.000"
}
]
The organization id to use for the request
JSON request payload to bulk create datasets
The body is of type object
.
Page of tags requested with all tags and the number of chunks in the dataset with that tag plus the total number of unique tags for the whole datset
Datasets
Was this page helpful?