POST
/
api
/
chunk_group
/
search
curl --request POST \
  --url https://api.trieve.ai/api/chunk_group/search \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --header 'TR-Dataset: <tr-dataset>' \
  --data '{
  "content_only": true,
  "filters": {
    "must": [
      {
        "field": "tag_set",
        "match_all": [
          "A",
          "B"
        ]
      },
      {
        "field": "num_value",
        "range": {
          "gte": 10,
          "lte": 25
        }
      }
    ]
  },
  "get_total_pages": true,
  "group_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "group_tracking_id": "<string>",
  "highlight_options": {
    "highlight_delimiters": [
      "<string>"
    ],
    "highlight_max_length": 1,
    "highlight_max_num": 1,
    "highlight_results": true,
    "highlight_strategy": "exactmatch",
    "highlight_threshold": 123,
    "highlight_window": 1,
    "post_tag": "<string>",
    "pre_tag": "<string>"
  },
  "page": 1,
  "page_size": 1,
  "query": {
    "image_url": "<string>",
    "llm_prompt": "<string>"
  },
  "remove_stop_words": true,
  "score_threshold": 123,
  "search_type": "fulltext",
  "slim_chunks": true,
  "sort_options": {
    "location_bias": {
      "bias": 123,
      "location": {
        "lat": 123,
        "lon": 123
      }
    },
    "mmr": {
      "mmr_lambda": 123,
      "use_mmr": true
    },
    "recency_bias": 123,
    "sort_by": {
      "direction": "desc",
      "field": "<string>",
      "prefetch_amount": 1
    },
    "tag_weights": {},
    "use_weights": true
  },
  "typo_options": {
    "correct_typos": true,
    "disable_on_word": [
      "<string>"
    ],
    "one_typo_word_range": {
      "max": 1,
      "min": 1
    },
    "prioritize_domain_specifc_words": true,
    "two_typo_word_range": {
      "max": 1,
      "min": 1
    }
  },
  "use_quote_negated_terms": true,
  "user_id": "<string>"
}'
{
  "chunks": [
    {
      "chunk": {
        "chunk_html": "<p>Some HTML content</p>",
        "content": "Some content",
        "id": "d290f1ee-6c54-4b01-90e6-d701748f0851",
        "link": "https://example.com",
        "metadata": {
          "key1": "value1",
          "key2": "value2"
        },
        "time_stamp": "2021-01-01 00:00:00.000",
        "weight": 0.5
      },
      "highlights": [
        "highlight is two tokens: high, light",
        "whereas hello is only one token: hello"
      ],
      "score": 0.5
    }
  ],
  "corrected_query": "<string>",
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "total_pages": 123
}

Authorizations

Authorization
string
header
required

Headers

TR-Dataset
string
required

The dataset id or tracking_id to use for the request. We assume you intend to use an id if the value is a valid uuid.

X-API-Version
enum<string>

The API version to use for this request. Defaults to V2 for orgs created after July 12, 2024 and V1 otherwise.

Available options:
V1,
V2

Body

application/json
JSON request payload to semantically search a group
query
required

Query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. You can either provide one query, or multiple with weights. Multi-query only works with Semantic Search and is not compatible with cross encoder re-ranking or highlights.

search_type
enum<string>
required
Available options:
fulltext,
semantic,
hybrid,
bm25
content_only
boolean | null

Set content_only to true to only returning the chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typically 10-50ms). Default is false.

filters
object

ChunkFilter is a JSON object which can be used to filter chunks. This is useful for when you want to filter chunks by arbitrary metadata. Unlike with tag filtering, there is a performance hit for filtering on metadata.

get_total_pages
boolean | null

Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms).

group_id
string | null

Group specifies the group to search within. Results will only consist of chunks which are bookmarks within the specified group.

group_tracking_id
string | null

Group_tracking_id specifies the group to search within by tracking id. Results will only consist of chunks which are bookmarks within the specified group. If both group_id and group_tracking_id are provided, group_id will be used.

highlight_options
object

Highlight Options lets you specify different methods to highlight the chunks in the result set. If not specified, this defaults to the score of the chunks.

page
integer | null

The page of chunks to fetch. Page is 1-indexed.

Required range: x > 0
page_size
integer | null

The page size is the number of chunks to fetch. This can be used to fetch more than 10 chunks at a time.

Required range: x > 0
remove_stop_words
boolean | null

If true, stop words (specified in server/src/stop-words.txt in the git repo) will be removed. Queries that are entirely stop words will be preserved.

score_threshold
number | null

Set score_threshold to a float to filter out chunks with a score below the threshold. This threshold applies before weight and bias modifications. If not specified, this defaults to 0.0.

slim_chunks
boolean | null

Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false.

sort_options
object

Sort Options lets you specify different methods to rerank the chunks in the result set. If not specified, this defaults to the score of the chunks.

typo_options
object

Typo Options lets you specify different methods to correct typos in the query. If not specified, typos will not be corrected.

use_quote_negated_terms
boolean | null

If true, quoted and - prefixed words will be parsed from the queries and used as required and negated words respectively. Default is false.

user_id
string | null

The user_id is the id of the user who is making the request. This is used to track user interactions with the search results.

Response

200
application/json
Group chunks which are similar to the embedding vector of the search query
chunks
object[]
required
id
string
required
total_pages
integer
required
corrected_query
string | null