Overview

We provide the ability for you to search your data in a fast and performant manner. We have multiple search paradigms, which are exposed through the search over chunks route, the search within groups route, and the search over groups route.

Different Search Paradigms

We offer three different search strategies for you to choose from:

  1. Search over chunks: This strategy allows you to search all of your chunks independently. This is useful when your chunks are independent and do not need to be grouped together.
  2. Search within groups: This strategy lets you constrain your results to within a selected group. This is useful for searching distinct groups within your dataset independently.
  3. Search over groups: This strategy allows you to search over the groups of chunks within your dataset. This returns the groups and the top chunks within each group that matched your query, providing better search quality for datasets with highly related chunks within groups.

You can use the search UI at search.trieve.ai to A/B test which search method works best for you.

Important Parameters

  • query: The user query that is embedded and searched against the dataset.
  • search_type: Can be semantic, fulltext, or hybrid.
    • Semantic: Uses cosine distance to determine the most relevant results.
    • Fulltext: Uses a SPLADE model to find the most relevant results.
    • Hybrid: Uses a reranker model that pulls one page of results from both fulltext and semantic searches to find the most relevant results.
  • page: The page of chunks to fetch. Pages are 1-indexed.
  • page_size: This lets you tune the number of results that are returned.
  • highlight_results: Enables subsentence highlighting of relevant portions of the text.
  • slim_chunks: Excludes chunk_html from the returned results to reduce network bandwidth. Useful for large chunks.
  • recency_bias: A value from 0-1 that tunes how much the recency of chunks (based on the timestamp field) affects the ranking.
  • filters: Apply filters to get exactly the results you want.

To optimize for the lowest latency, set highlight_results and get_total_pages to false and set slim_chunks to true. If you are willing to sacrifice some search quality for speed, use the fulltext search mode.

Filtering

We provide a system to allow users to filter the chunks that are returned.

The filters are structured around three clauses:

  • must: All filters within this clause must be matched to return the chunks.

    Get chunks with both “CO” and “321” in their tag_set:

    {
      "must": [
        {
          "field": "tag_set",
          "match": ["CO"]
        },
        {
          "field": "tag_set",
          "match": ["321"]
        }
      ]
    }
    

    Get chunks with either “CO” OR “321” in their tag_set:

    {
      "must": [
        {
          "field": "tag_set",
          "match": ["CO", "321"]
        }
      ]
    }
    
  • must_not: All filters in this clause must not be matched to return the chunks.

    Get chunks with neither “CO” nor “321” in their tag_set:

    {
      "must_not": [
        {
          "field": "tag_set",
          "match": ["CO", "321"]
        }
      ]
    }
    

    Get chunks that either don’t have “CO” in their tag_set or don’t have “321” in their tag_set:

    {
      "must_not": [
        {
          "field": "tag_set",
          "match": ["CO"]
        },
        {
          "field": "tag_set",
          "match": ["321"]
        }
      ]
    }
    
  • should: Any of these conditions can be matched to return a chunk.

    Get chunks that either have “CO” in their tag_set or “http://example.com” in their link:

    {
      "should": [
        {
          "field": "tag_set",
          "match": ["CO"]
        },
        {
          "field": "link",
          "match": ["http://example.com"]
        }
      ]
    }