Search
This route provides the primary search functionality for the API. It can be used to search for chunks by semantic similarity, full-text similarity, or a combination of both. Results’ chunk_html
values will be modified with <mark><b>
or custom specified tags for sub-sentence highlighting.
Authorizations
Headers
The dataset id or tracking_id to use for the request. We assume you intend to use an id if the value is a valid uuid.
The API version to use for this request. Defaults to V2 for orgs created after July 12, 2024 and V1 otherwise.
V1
, V2
Body
Query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. You can either provide one query, or multiple with weights. Multi-query only works with Semantic Search and is not compatible with cross encoder re-ranking or highlights.
fulltext
, semantic
, hybrid
, bm25
Set content_only to true to only returning the chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typically 10-50ms). Default is false.
ChunkFilter is a JSON object which can be used to filter chunks. This is useful for when you want to filter chunks by arbitrary metadata. Unlike with tag filtering, there is a performance hit for filtering on metadata.
Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms).
Highlight Options lets you specify different methods to highlight the chunks in the result set. If not specified, this defaults to the score of the chunks.
Page of chunks to fetch. Page is 1-indexed.
x > 0
Page size is the number of chunks to fetch. This can be used to fetch more than 10 chunks at a time.
x > 0
If true, stop words (specified in server/src/stop-words.txt in the git repo) will be removed. Queries that are entirely stop words will be preserved.
Set score_threshold to a float to filter out chunks with a score below the threshold for cosine distance metric. For Manhattan Distance, Euclidean Distance, and Dot Product, it will filter out scores above the threshold distance. This threshold applies before weight and bias modifications. If not specified, this defaults to no threshold. A threshold of 0 will default to no threshold.
Scoring options provides ways to modify the sparse or dense vector created for the query in order to change how potential matches are scored. If not specified, this defaults to no modifications.
Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typically 10-50ms). Default is false.
Sort Options lets you specify different methods to rerank the chunks in the result set. If not specified, this defaults to the score of the chunks.
Typo Options lets you specify different methods to correct typos in the query. If not specified, typos will not be corrected.
If true, quoted and - prefixed words will be parsed from the queries and used as required and negated words respectively. Default is false.
User ID is the id of the user who is making the request. This is used to track user interactions with the search results.
Was this page helpful?