• Chunk: A chunk is a piece of data that is uploaded to Trieve. It is the smallest unit of data that can be searched, recommended, or used in RAG. Chunks are typically created by chunking a larger piece of data into smaller pieces. For example, a document can be chunked into paragraphs, sentences, or even words depending on the use case.

    • Tag Set: A tag set is a collection of tags that can be associated with a chunk. Tags can be used to categorize or filter chunks based on specific criteria. For example, a chunk representing a job posting could be tagged with the job title, location, company name, etc.

    • Metadata: Metadata is a json object that can be associated with a chunk. Metadata can be used to store additional information about a chunk, such as the source of the data, creation date, author, etc. Although filtering by metadata is supported, it is not recommended if latency is a concern.

  • Groups: Groups are a way to associate related chunks together. Chunks that belong to the same group are considered to be related in some way. For example, chunks representing different sections of a document can be grouped together to indicate that they are part of the same document. You can perform queries within a group or across groups.

  • Dataset: A dataset is a collection of chunks. Datasets are created in the Trieve dashboard and are used to organize and manage chunks. Datasets can be used to search, recommend, or generate responses using RAG.

  • Search: Search is the process of finding relevant chunks in a dataset based on a query. Trieve provides a search API that allows you to search for chunks based on text similarity.

  • Recommendations: Recommendations are a list of chunks that are similar to a given chunk. Trieve provides a recommendation API that allows you to get recommendations for a chunk based on text similarity.

  • RAG (Retrieval Augmented Generation): RAG is a technique that combines search and generation to generate responses to user queries. Trieve provides a RAG API that allows you to generate responses to user queries based on the content of your dataset.