Chunk
- POSTCreate or Upsert Chunk or Chunks
- POSTSearch
- POSTAutocomplete
- POSTGet Recommended Chunks
- POSTScroll Chunks
- POSTCount chunks above threshold
- POSTGenerate suggested queries
- POSTRAG on Specified Chunks
- PUTUpdate Chunk
- PUTUpdate Chunk By Tracking Id
- GETGet Chunk By Id
- GETGet Chunk By Tracking Id
- POSTGet Chunks By Tracking Ids
- POSTGet Chunks By Ids
- DELDelete Chunk
- DELDelete Chunk By Tracking Id
- DELBulk Delete Chunks
- POSTSplit HTML Content into Chunks
Chunk Group
- POSTCreate or Upsert Group or Groups
- POSTSearch Over Groups
- POSTSearch Within Group
- POSTGet Recommended Groups
- POSTAdd Chunk to Group
- POSTAdd Chunk to Group by Tracking ID
- POSTGet Groups for Chunks
- GETGet Chunks in Group by Tracking ID
- GETGet Group by Tracking ID
- PUTUpdate Group
- DELRemove Chunk from Group
- DELDelete Group by Tracking ID
- DELDelete Group
- GETGet Group
- GETGet Chunks in Group
- GETGet Groups for Dataset
Message
Crawl
File
Analytics
Dataset
- POSTCreate Dataset
- POSTBatch Create Datasets
- POSTGet All Tags
- POSTGet events for the dataset
- PUTUpdate Dataset by ID or Tracking ID
- PUTClear Dataset
- GETGet Dataset By ID
- GETGet Dataset by Tracking ID
- GETGet Datasets from Organization
- POSTCreate ETL Job
- PUTCreate Pagefind Index for Dataset
- GETGet Pagefind Index Url for Dataset
- GETGet Usage By Dataset ID
- GETGet dataset crawl options
- GETGet apipublic page
- DELDelete Dataset
- DELDelete Dataset by Tracking ID
Organization
Health
Stripe
Metrics
Batch Create Datasets
Datasets will be created in the org specified via the TR-Organization header. Auth’ed user must be an owner of the organization to create datasets. If a tracking_id is ignored due to it already existing on the org, the response will not contain a dataset with that tracking_id and it can be assumed that a dataset with the missing tracking_id already exists.
curl --request POST \
--url https://api.trieve.ai/api/dataset/batch_create_datasets \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Organization: <tr-organization>' \
--data '{
"datasets": [
{
"dataset_name": "<string>",
"server_configuration": {
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://api.openai.com/v1",
"EMBEDDING_MODEL_NAME": "text-embedding-3-small",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 1536,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "<string>"
}
],
"upsert": true
}'
[
{
"created_at": "2021-01-01 00:00:00.000",
"id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"name": "Trieve",
"organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"server_configuration": {
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://api.openai.com/v1",
"EMBEDDING_MODEL_NAME": "text-embedding-3-small",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 1536,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "foobar-dataset",
"updated_at": "2021-01-01 00:00:00.000"
}
]
Authorizations
Headers
The organization id to use for the request
Body
List of datasets to create
Name of the dataset.
Lets you specify the configuration for a dataset
The average length of the chunks in the index for BM25
The BM25 B parameter
Whether to use BM25
The BM25 K parameter
Whether to disable analytics
euclidean
, cosine
, manhattan
, dot
The base URL for the embedding API
The name of the embedding model to use
The prefix to use for the embedding query
The size of the embeddings
x > 0
The frequency penalty to use
Whether to use fulltext search
Whether to only use indexed chunks
The base URL for the LLM API
The default model to use for the LLM
Whether the dataset is locked to prevent changes or deletion
The maximum limit for the number of chunks for counting
x > 0
The maximum number of tokens to use in LLM Response
x > 0
The prompt to use for converting a message to a query
The number of retrievals to include with the RAG model
x > 0
Whether to enable pagefind indexing
The presence penalty to use
x > 0
light
, dark
Whether or not to insert chunks into Postgres
The prompt to use for the RAG model
The base URL for the reranker API
The model name for the Reranker API
Whether to use semantic search
The stop tokens to use
The system prompt to use for the LLM
The temperature to use
Whether to use the message to query prompt
Optional tracking ID for the dataset. Can be used to track the dataset in external systems. Must be unique within the organization. Strongly recommended to not use a valid uuid value as that will not work with the TR-Dataset header.
Upsert when a dataset with one of the specified tracking_ids already exists. By default this is false and specified datasets with a tracking_id that already exists in the org will not be ignored. If true, the existing dataset will be updated with the new dataset's details.
Response
Timestamp of the creation of the dataset
Flag to indicate if the dataset has been deleted. Deletes are handled async after the flag is set so as to avoid expensive search index compaction.
Unique identifier of the dataset, auto-generated uuid created by Trieve
Name of the dataset
Unique identifier of the organization that owns the dataset
Configuration of the dataset for RAG, embeddings, BM25, etc.
Timestamp of the last update of the dataset
Tracking ID of the dataset, can be any string, determined by the user. Tracking ID's are unique identifiers for datasets within an organization. They are designed to match the unique identifier of the dataset in the user's system.
Was this page helpful?
curl --request POST \
--url https://api.trieve.ai/api/dataset/batch_create_datasets \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Organization: <tr-organization>' \
--data '{
"datasets": [
{
"dataset_name": "<string>",
"server_configuration": {
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://api.openai.com/v1",
"EMBEDDING_MODEL_NAME": "text-embedding-3-small",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 1536,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "<string>"
}
],
"upsert": true
}'
[
{
"created_at": "2021-01-01 00:00:00.000",
"id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"name": "Trieve",
"organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
"server_configuration": {
"BM25_AVG_LEN": 256,
"BM25_B": 0.75,
"BM25_ENABLED": true,
"BM25_K": 0.75,
"DISTANCE_METRIC": "cosine",
"EMBEDDING_BASE_URL": "https://api.openai.com/v1",
"EMBEDDING_MODEL_NAME": "text-embedding-3-small",
"EMBEDDING_QUERY_PREFIX": "",
"EMBEDDING_SIZE": 1536,
"FREQUENCY_PENALTY": 0,
"FULLTEXT_ENABLED": true,
"INDEXED_ONLY": false,
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_DEFAULT_MODEL": "gpt-4o",
"LOCKED": false,
"MAX_LIMIT": 10000,
"MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
"N_RETRIEVALS_TO_INCLUDE": 8,
"PRESENCE_PENALTY": 0,
"QDRANT_ONLY": false,
"RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
"SEMANTIC_ENABLED": true,
"STOP_TOKENS": [
"\n\n",
"\n"
],
"SYSTEM_PROMPT": "You are a helpful assistant",
"TEMPERATURE": 0.5,
"USE_MESSAGE_TO_QUERY_PROMPT": false
},
"tracking_id": "foobar-dataset",
"updated_at": "2021-01-01 00:00:00.000"
}
]