Dataset
Batch Create Datasets
Chunk
- POSTCreate or Upsert Chunk or Chunks
- POSTSearch
- POSTAutocomplete
- POSTGet Recommended Chunks
- POSTScroll Chunks
- POSTCount chunks above threshold
- POSTGenerate suggested queries
- POSTRAG on Specified Chunks
- PUTUpdate Chunk
- PUTUpdate Chunk By Tracking Id
- GETGet Chunk By Id
- GETGet Chunk By Tracking Id
- POSTGet Chunks By Tracking Ids
- POSTGet Chunks By Ids
- DELDelete Chunk
- DELDelete Chunk By Tracking Id
- DELBulk Delete Chunks
- POSTSplit HTML Content into Chunks
Chunk Group
- POSTCreate or Upsert Group or Groups
- POSTSearch Over Groups
- POSTSearch Within Group
- POSTGet Recommended Groups
- POSTAdd Chunk to Group
- POSTAdd Chunk to Group by Tracking ID
- POSTGet Groups for Chunks
- GETGet Chunks in Group by Tracking ID
- GETGet Group by Tracking ID
- PUTUpdate Group
- DELRemove Chunk from Group
- DELDelete Group by Tracking ID
- DELDelete Group
- GETGet Group
- GETGet Chunks in Group
- GETGet Groups for Dataset
Message
Crawl
File
Analytics
Dataset
- POSTCreate Dataset
- POSTBatch Create Datasets
- POSTGet All Tags
- POSTGet events for the dataset
- PUTUpdate Dataset by ID or Tracking ID
- PUTClear Dataset
- GETGet Dataset By ID
- GETGet Dataset by Tracking ID
- GETGet Datasets from Organization
- POSTCreate ETL Job
- PUTCreate Pagefind Index for Dataset
- GETGet Pagefind Index Url for Dataset
- GETGet Usage By Dataset ID
- GETGet dataset crawl options
- GETGet apipublic page
- DELDelete Dataset
- DELDelete Dataset by Tracking ID
Organization
Health
Stripe
Metrics
Dataset
Batch Create Datasets
Datasets will be created in the org specified via the TR-Organization header. Auth’ed user must be an owner of the organization to create datasets. If a tracking_id is ignored due to it already existing on the org, the response will not contain a dataset with that tracking_id and it can be assumed that a dataset with the missing tracking_id already exists.
POST
/
api
/
dataset
/
batch_create_datasets
curl --request POST \
--url https://api.trieve.ai/api/dataset/batch_create_datasets \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Organization: <tr-organization>' \
--data '{
"datasets": [
{
"dataset_name": "<string>",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "<string>"
}
],
"upsert": true
}'
[
{
"created_at": "2021-01-01 00:00:00.000",
"id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"name": "Trieve",
"organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "foobar-dataset",
"updated_at": "2021-01-01 00:00:00.000"
}
]
Authorizations
Headers
The organization id to use for the request
Body
application/json
JSON request payload to bulk create datasets
The body is of type object
.
Response
200
application/json
Page of tags requested with all tags and the number of chunks in the dataset with that tag plus the total number of unique tags for the whole datset
Datasets
Was this page helpful?
curl --request POST \
--url https://api.trieve.ai/api/dataset/batch_create_datasets \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Organization: <tr-organization>' \
--data '{
"datasets": [
{
"dataset_name": "<string>",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "<string>"
}
],
"upsert": true
}'
[
{
"created_at": "2021-01-01 00:00:00.000",
"id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"name": "Trieve",
"organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"server_configuration": {
"AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
"EMBEDDING_MODEL_NAME": "jina-base-en",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 768,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "foobar-dataset",
"updated_at": "2021-01-01 00:00:00.000"
}
]