Chunk
- POSTCreate or Upsert Chunk or Chunks
- POSTSearch
- POSTAutocomplete
- POSTGet Recommended Chunks
- POSTScroll Chunks
- POSTCount chunks above threshold
- POSTGenerate suggested queries
- POSTRAG on Specified Chunks
- PUTUpdate Chunk
- PUTUpdate Chunk By Tracking Id
- GETGet Chunk By Id
- GETGet Chunk By Tracking Id
- POSTGet Chunks By Tracking Ids
- POSTGet Chunks By Ids
- DELDelete Chunk
- DELDelete Chunk By Tracking Id
- DELBulk Delete Chunks
- POSTSplit HTML Content into Chunks
Chunk Group
- POSTCreate or Upsert Group or Groups
- POSTSearch Over Groups
- POSTSearch Within Group
- POSTGet Recommended Groups
- POSTAdd Chunk to Group
- POSTAdd Chunk to Group by Tracking ID
- POSTGet Groups for Chunks
- GETGet Chunks in Group by Tracking ID
- GETGet Group by Tracking ID
- PUTUpdate Group
- DELRemove Chunk from Group
- DELDelete Group by Tracking ID
- DELDelete Group
- GETGet Group
- GETGet Chunks in Group
- GETGet Groups for Dataset
Message
File
Analytics
Dataset
- POSTCreate Dataset
- POSTBatch Create Datasets
- POSTGet All Tags
- POSTGet events for the dataset
- PUTUpdate Dataset by ID or Tracking ID
- PUTClear Dataset
- GETGet Dataset By ID
- GETGet Dataset by Tracking ID
- GETGet Datasets from Organization
- GETGet Usage By Dataset ID
- GETGet Dataset Crawl Options
- GETGet apipublic page
- DELDelete Dataset
- DELDelete Dataset by Tracking ID
Organization
Health
Stripe
Metrics
Update Dataset by ID or Tracking ID
One of id or tracking_id must be provided. The auth’ed user must be an owner of the organization to update a dataset.
Authorizations
Headers
The organization id to use for the request
Body
Options for setting up the crawl which will populate the dataset.
Option for allowing the crawl to follow links to external websites.
Text strings to remove from body when creating chunks for each page
Boost titles such that keyword matches in titles are prioritized in search results. Strongly recommended to leave this on. Defaults to true.
URL Patterns to exclude from the crawl
Specify the HTML tags, classes and ids to exclude from the response.
Text strings to remove from headings when creating chunks for each page
Ignore the website sitemap when crawling, defaults to true.
URL Patterns to include in the crawl
Specify the HTML tags, classes and ids to include in the response.
Interval at which specified site should be re-scraped
daily
, weekly
, monthly
How many pages to crawl, defaults to 1000
Options for including an openapi spec or shopify settigns
OpenAPI json schema to be processed alongside the site crawl
Tag to look for to determine if a page should create an openapi route chunk instead of chunks from heading-split of the HTML
openapi
The URL to crawl
Metadata to send back with the webhook call for each successful page scrape
Host to call back on the webhook for each successful page scrape
The id of the dataset you want to update.
The new name of the dataset. Must be unique within the organization. If not provided, the name will not be updated.
Optional new tracking ID for the dataset. Can be used to track the dataset in external systems. Must be unique within the organization. If not provided, the tracking ID will not be updated. Strongly recommended to not use a valid uuid value as that will not work with the TR-Dataset header.
Lets you specify the configuration for a dataset
The average length of the chunks in the index for BM25
The BM25 B parameter
Whether to use BM25
The BM25 K parameter
Whether to disable analytics
euclidean
, cosine
, manhattan
, dot
The base URL for the embedding API
The name of the embedding model to use
The prefix to use for the embedding query
The size of the embeddings
x > 0
The frequency penalty to use
Whether to use fulltext search
Whether to only use indexed chunks
The base URL for the LLM API
The default model to use for the LLM
Whether the dataset is locked to prevent changes or deletion
The maximum limit for the number of chunks for counting
x > 0
The maximum number of tokens to use in LLM Response
x > 0
The prompt to use for converting a message to a query
The number of retrievals to include with the RAG model
x > 0
Whether to enable pagefind indexing
The presence penalty to use
x > 0
Set content_only to true to only returning the chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typically 10-50ms). Default is false.
ChunkFilter is a JSON object which can be used to filter chunks. This is useful for when you want to filter chunks by arbitrary metadata. Unlike with tag filtering, there is a performance hit for filtering on metadata.
Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms).
Page of chunks to fetch. Page is 1-indexed.
x > 0
Page size is the number of chunks to fetch. This can be used to fetch more than 10 chunks at a time.
x > 0
If true, stop words (specified in server/src/stop-words.txt in the git repo) will be removed. Queries that are entirely stop words will be preserved.
Set score_threshold to a float to filter out chunks with a score below the threshold for cosine distance metric. For Manhattan Distance, Euclidean Distance, and Dot Product, it will filter out scores above the threshold distance. This threshold applies before weight and bias modifications. If not specified, this defaults to no threshold. A threshold of 0 will default to no threshold.
Scoring options provides ways to modify the sparse or dense vector created for the query in order to change how potential matches are scored. If not specified, this defaults to no modifications.
fulltext
, semantic
, hybrid
, bm25
Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typically 10-50ms). Default is false.
Sort Options lets you specify different methods to rerank the chunks in the result set. If not specified, this defaults to the score of the chunks.
Typo Options lets you specify different methods to correct typos in the query. If not specified, typos will not be corrected.
Enables autocomplete on the search modal.
If true, quoted and - prefixed words will be parsed from the queries and used as required and negated words respectively. Default is false.
User ID is the id of the user who is making the request. This is used to track user interactions with the search results.
light
, dark
Whether or not to insert chunks into Postgres
The prompt to use for the RAG model
The base URL for the reranker API
The model name for the Reranker API
Whether to use semantic search
The stop tokens to use
The system prompt to use for the LLM
The temperature to use
Whether to use the message to query prompt
The tracking ID of the dataset you want to update.
Response
Timestamp of the creation of the dataset
Flag to indicate if the dataset has been deleted. Deletes are handled async after the flag is set so as to avoid expensive search index compaction.
Unique identifier of the dataset, auto-generated uuid created by Trieve
Name of the dataset
Unique identifier of the organization that owns the dataset
Configuration of the dataset for RAG, embeddings, BM25, etc.
Timestamp of the last update of the dataset
Tracking ID of the dataset, can be any string, determined by the user. Tracking ID's are unique identifiers for datasets within an organization. They are designed to match the unique identifier of the dataset in the user's system.