POST
/
api
/
chunk_group
/
group_oriented_search

Authorizations

Authorization
string
headerrequired

Headers

TR-Dataset
string
required

The dataset id or tracking_id to use for the request. We assume you intend to use an id if the value is a valid uuid.

X-API-Version
enum<string>

The API version to use for this request. Defaults to V2 for orgs created after July 12, 2024 and V1 otherwise.

Available options:
V1,
V2

Body

application/json
filters
object

Filters is a JSON object which can be used to filter chunks. This is useful for when you want to filter chunks by arbitrary metadata. Unlike with tag filtering, there is a performance hit for filtering on metadata.

get_total_pages
boolean | null

Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms).

group_size
integer | null

Group_size is the number of chunks to fetch for each group. The default is 3. If a group has less than group_size chunks, all chunks will be returned. If this is set to a large number, we recommend setting slim_chunks to true to avoid returning the content and chunk_html of the chunks so as to lower the amount of time required for content download and serialization.

highlight_options
object

Highlight Options lets you specify different methods to highlight the chunks in the result set. If not specified, this defaults to the score of the chunks.

page
integer | null

Page of group results to fetch. Page is 1-indexed.

page_size
integer | null

Page size is the number of group results to fetch. The default is 10.

query
required

Query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. You can either provide one query, or multiple with weights. Multi-query only works with Semantic Search and is not compatible with cross encoder re-ranking or highlights.

remove_stop_words
boolean | null

If true, stop words (specified in server/src/stop-words.txt in the git repo) will be removed. Queries that are entirely stop words will be preserved.

score_threshold
number | null

Set score_threshold to a float to filter out chunks with a score below the threshold. This threshold applies before weight and bias modifications. If not specified, this defaults to 0.0.

search_type
enum<string>
required
Available options:
fulltext,
semantic,
hybrid,
bm25
slim_chunks
boolean | null

Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false.

sort_options
object

Sort Options lets you specify different methods to rerank the chunks in the result set. If not specified, this defaults to the score of the chunks.

typo_options
object

Typo Options lets you specify different methods to correct typos in the query. If not specified, typos will not be corrected.

use_quote_negated_terms
boolean | null

If true, quoted and - prefixed words will be parsed from the queries and used as required and negated words respectively. Default is false.

user_id
string | null

The user_id is the id of the user who is making the request. This is used to track user interactions with the search results.

Response

200 - application/json
corrected_query
string | null
id
string
required
results
object[]
required
total_pages
integer
required