POST
/
api
/
chunk_group
/
search

Authorizations

Authorization
string
headerrequired

Headers

TR-Dataset
string
required

The dataset id or tracking_id to use for the request. We assume you intend to use an id if the value is a valid uuid.

X-API-Version
enum<string>

The API version to use for this request. Defaults to V2 for orgs created after July 12, 2024 and V1 otherwise.

Available options:
V1,
V2

Body

application/json
content_only
boolean | null

Set content_only to true to only returning the chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typically 10-50ms). Default is false.

filters
object

Filters is a JSON object which can be used to filter chunks. This is useful for when you want to filter chunks by arbitrary metadata. Unlike with tag filtering, there is a performance hit for filtering on metadata.

get_total_pages
boolean | null

Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms).

group_id
string | null

Group specifies the group to search within. Results will only consist of chunks which are bookmarks within the specified group.

group_tracking_id
string | null

Group_tracking_id specifies the group to search within by tracking id. Results will only consist of chunks which are bookmarks within the specified group. If both group_id and group_tracking_id are provided, group_id will be used.

highlight_options
object

Highlight Options lets you specify different methods to highlight the chunks in the result set. If not specified, this defaults to the score of the chunks.

page
integer | null

The page of chunks to fetch. Page is 1-indexed.

page_size
integer | null

The page size is the number of chunks to fetch. This can be used to fetch more than 10 chunks at a time.

query
required

Query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. You can either provide one query, or multiple with weights. Multi-query only works with Semantic Search and is not compatible with cross encoder re-ranking or highlights.

remove_stop_words
boolean | null

If true, stop words (specified in server/src/stop-words.txt in the git repo) will be removed. Queries that are entirely stop words will be preserved.

score_threshold
number | null

Set score_threshold to a float to filter out chunks with a score below the threshold. This threshold applies before weight and bias modifications. If not specified, this defaults to 0.0.

search_type
enum<string>
required
Available options:
fulltext,
semantic,
hybrid,
bm25
slim_chunks
boolean | null

Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false.

sort_options
object

Sort Options lets you specify different methods to rerank the chunks in the result set. If not specified, this defaults to the score of the chunks.

typo_options
object

Typo Options lets you specify different methods to correct typos in the query. If not specified, typos will not be corrected.

use_quote_negated_terms
boolean | null

If true, quoted and - prefixed words will be parsed from the queries and used as required and negated words respectively. Default is false.

user_id
string | null

The user_id is the id of the user who is making the request. This is used to track user interactions with the search results.

Response

200 - application/json
chunks
object[]
required
corrected_query
string | null
id
string
required
total_pages
integer
required