Create Dataset

curl --request POST \
  --url https://api.trieve.ai/api/dataset \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --header 'TR-Organization: <tr-organization>' \
  --data '{
  "dataset_name": "My Dataset",
  "organization_id": "00000000-0000-0000-0000-000000000000",
  "server_configuration": {
    "AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
    "BM25_AVG_LEN": 256,
    "BM25_B": 0.75,
    "BM25_ENABLED": true,
    "BM25_K": 0.75,
    "DISTANCE_METRIC": "cosine",
    "EMBEDDING_BASE_URL": "https://api.openai.com/v1",
    "EMBEDDING_MODEL_NAME": "text-embedding-3-small",
    "EMBEDDING_QUERY_PREFIX": "",
    "EMBEDDING_SIZE": 1536,
    "FREQUENCY_PENALTY": 0,
    "FULLTEXT_ENABLED": true,
    "INDEXED_ONLY": false,
    "LLM_BASE_URL": "https://api.openai.com/v1",
    "LLM_DEFAULT_MODEL": "gpt-3.5-turbo-1106",
    "LOCKED": false,
    "MAX_LIMIT": 10000,
    "MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
    "N_RETRIEVALS_TO_INCLUDE": 8,
    "PRESENCE_PENALTY": 0,
    "QDRANT_ONLY": false,
    "RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
    "SEMANTIC_ENABLED": true,
    "STOP_TOKENS": [
      "\n\n",
      "\n"
    ],
    "SYSTEM_PROMPT": "You are a helpful assistant",
    "TEMPERATURE": 0.5,
    "USE_MESSAGE_TO_QUERY_PROMPT": false
  }
}'

{
  "created_at": "2021-01-01 00:00:00.000",
  "id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
  "name": "Trieve",
  "organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
  "server_configuration": {
    "AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
    "BM25_AVG_LEN": 256,
    "BM25_B": 0.75,
    "BM25_ENABLED": true,
    "BM25_K": 0.75,
    "DISTANCE_METRIC": "cosine",
    "EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
    "EMBEDDING_MODEL_NAME": "jina-base-en",
    "EMBEDDING_QUERY_PREFIX": "",
    "EMBEDDING_SIZE": 768,
    "FREQUENCY_PENALTY": 0,
    "FULLTEXT_ENABLED": true,
    "INDEXED_ONLY": false,
    "LLM_BASE_URL": "https://api.openai.com/v1",
    "LLM_DEFAULT_MODEL": "gpt-4o",
    "LOCKED": false,
    "MAX_LIMIT": 10000,
    "MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
    "N_RETRIEVALS_TO_INCLUDE": 8,
    "PRESENCE_PENALTY": 0,
    "QDRANT_ONLY": false,
    "RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
    "SEMANTIC_ENABLED": true,
    "STOP_TOKENS": [
      "\n\n",
      "\n"
    ],
    "SYSTEM_PROMPT": "You are a helpful assistant",
    "TEMPERATURE": 0.5,
    "USE_MESSAGE_TO_QUERY_PROMPT": false
  },
  "tracking_id": "foobar-dataset",
  "updated_at": "2021-01-01 00:00:00.000"
}

POST

api

dataset

Create Dataset

curl --request POST \
  --url https://api.trieve.ai/api/dataset \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --header 'TR-Organization: <tr-organization>' \
  --data '{
  "dataset_name": "My Dataset",
  "organization_id": "00000000-0000-0000-0000-000000000000",
  "server_configuration": {
    "AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
    "BM25_AVG_LEN": 256,
    "BM25_B": 0.75,
    "BM25_ENABLED": true,
    "BM25_K": 0.75,
    "DISTANCE_METRIC": "cosine",
    "EMBEDDING_BASE_URL": "https://api.openai.com/v1",
    "EMBEDDING_MODEL_NAME": "text-embedding-3-small",
    "EMBEDDING_QUERY_PREFIX": "",
    "EMBEDDING_SIZE": 1536,
    "FREQUENCY_PENALTY": 0,
    "FULLTEXT_ENABLED": true,
    "INDEXED_ONLY": false,
    "LLM_BASE_URL": "https://api.openai.com/v1",
    "LLM_DEFAULT_MODEL": "gpt-3.5-turbo-1106",
    "LOCKED": false,
    "MAX_LIMIT": 10000,
    "MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
    "N_RETRIEVALS_TO_INCLUDE": 8,
    "PRESENCE_PENALTY": 0,
    "QDRANT_ONLY": false,
    "RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
    "SEMANTIC_ENABLED": true,
    "STOP_TOKENS": [
      "\n\n",
      "\n"
    ],
    "SYSTEM_PROMPT": "You are a helpful assistant",
    "TEMPERATURE": 0.5,
    "USE_MESSAGE_TO_QUERY_PROMPT": false
  }
}'

{
  "created_at": "2021-01-01 00:00:00.000",
  "id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
  "name": "Trieve",
  "organization_id": "e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3",
  "server_configuration": {
    "AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
    "BM25_AVG_LEN": 256,
    "BM25_B": 0.75,
    "BM25_ENABLED": true,
    "BM25_K": 0.75,
    "DISTANCE_METRIC": "cosine",
    "EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
    "EMBEDDING_MODEL_NAME": "jina-base-en",
    "EMBEDDING_QUERY_PREFIX": "",
    "EMBEDDING_SIZE": 768,
    "FREQUENCY_PENALTY": 0,
    "FULLTEXT_ENABLED": true,
    "INDEXED_ONLY": false,
    "LLM_BASE_URL": "https://api.openai.com/v1",
    "LLM_DEFAULT_MODEL": "gpt-4o",
    "LOCKED": false,
    "MAX_LIMIT": 10000,
    "MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
    "N_RETRIEVALS_TO_INCLUDE": 8,
    "PRESENCE_PENALTY": 0,
    "QDRANT_ONLY": false,
    "RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
    "SEMANTIC_ENABLED": true,
    "STOP_TOKENS": [
      "\n\n",
      "\n"
    ],
    "SYSTEM_PROMPT": "You are a helpful assistant",
    "TEMPERATURE": 0.5,
    "USE_MESSAGE_TO_QUERY_PROMPT": false
  },
  "tracking_id": "foobar-dataset",
  "updated_at": "2021-01-01 00:00:00.000"
}

Authorizations

Authorization

string

header

required

Headers

TR-Organization

string<uuid>

required

The organization id to use for the request

Body

application/json

JSON request payload to create a new dataset

dataset_name

string

required

Name of the dataset.

server_configuration

object

Lets you specify the configuration for a dataset

Show child attributes

Example:

{
  "AIMON_RERANKER_TASK_DEFINITION": "Your task is to grade the relevance of context document(s) against the specified user query.",
  "BM25_AVG_LEN": 256,
  "BM25_B": 0.75,
  "BM25_ENABLED": true,
  "BM25_K": 0.75,
  "DISTANCE_METRIC": "cosine",
  "EMBEDDING_BASE_URL": "https://embedding.trieve.ai",
  "EMBEDDING_MODEL_NAME": "jina-base-en",
  "EMBEDDING_QUERY_PREFIX": "",
  "EMBEDDING_SIZE": 768,
  "FREQUENCY_PENALTY": 0,
  "FULLTEXT_ENABLED": true,
  "INDEXED_ONLY": false,
  "LLM_BASE_URL": "https://api.openai.com/v1",
  "LLM_DEFAULT_MODEL": "gpt-4o",
  "LOCKED": false,
  "MAX_LIMIT": 10000,
  "MESSAGE_TO_QUERY_PROMPT": "Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n",
  "N_RETRIEVALS_TO_INCLUDE": 8,
  "PRESENCE_PENALTY": 0,
  "QDRANT_ONLY": false,
  "RAG_PROMPT": "Use the following retrieved documents to respond briefly and accurately:",
  "SEMANTIC_ENABLED": true,
  "STOP_TOKENS": ["\n\n", "\n"],
  "SYSTEM_PROMPT": "You are a helpful assistant",
  "TEMPERATURE": 0.5,
  "USE_MESSAGE_TO_QUERY_PROMPT": false
}

tracking_id

string | null

Optional tracking ID for the dataset. Can be used to track the dataset in external systems. Must be unique within the organization. Strongly recommended to not use a valid uuid value as that will not work with the TR-Dataset header.

Response

Dataset created successfully

created_at

string<date-time>

required

Timestamp of the creation of the dataset

deleted

integer

required

Flag to indicate if the dataset has been deleted. Deletes are handled async after the flag is set so as to avoid expensive search index compaction.

string<uuid>

required

Unique identifier of the dataset, auto-generated uuid created by Trieve

name

string

required

Name of the dataset

organization_id

string<uuid>

required

Unique identifier of the organization that owns the dataset

server_configuration

any

required

Configuration of the dataset for RAG, embeddings, BM25, etc.

updated_at

string<date-time>

required

Timestamp of the last update of the dataset

tracking_id

string | null

Tracking ID of the dataset, can be any string, determined by the user. Tracking ID's are unique identifiers for datasets within an organization. They are designed to match the unique identifier of the dataset in the user's system.

Get experiment Batch Create Datasets

⌘I

Chunk

Chunk Group

Topic

Message

Crawl

File

Analytics

Experiments

Dataset

Organization

User

Auth

Health

Invitation

Stripe

Metrics

Public

Create Dataset

Authorizations

Headers

Body

Response