> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trieve.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# RAG on Specified Chunks

> This endpoint exists as an alternative to the topic+message resource pattern where our Trieve handles chat memory. With this endpoint, the user is responsible for providing the context window and the prompt and the conversation is ephemeral.


## OpenAPI

````yaml post /api/chunk/generate
openapi: 3.0.3
info:
  title: Trieve API
  description: >-
    Trieve OpenAPI Specification. This document describes all of the operations
    available through the Trieve API.
  contact:
    name: Trieve Team
    url: https://trieve.ai
    email: developers@trieve.ai
  license:
    name: BSL
    url: https://github.com/devflowinc/trieve/blob/main/LICENSE.txt
  version: 0.13.0
servers:
  - url: https://api.trieve.ai
    description: Production server
  - url: http://localhost:8090
    description: Local development server
security: []
tags:
  - name: Invitation
    description: Invitation endpoint. Exists to invite users to an organization.
  - name: Auth
    description: Authentication endpoint. Serves to register and authenticate users.
  - name: User
    description: User endpoint. Enables you to modify user roles and information.
  - name: Organization
    description: >-
      Organization endpoint. Enables you to modify organization roles and
      information.
  - name: Dataset
    description: >-
      Dataset endpoint. Datasets belong to organizations and hold configuration
      information for both client and server. Datasets contain chunks and chunk
      groups.
  - name: Chunk
    description: >-
      Chunk endpoint. Think of chunks as individual searchable units of
      information. The majority of your integration will likely be with the
      Chunk endpoint.
  - name: Chunk Group
    description: >-
      Chunk groups endpoint. Think of a chunk_group as a bookmark folder within
      the dataset.
  - name: Crawl
    description: Crawl endpoint. Used to create and manage crawls for datasets.
  - name: File
    description: >-
      File endpoint. When files are uploaded, they are stored in S3 and broken
      up into chunks with text extraction from Apache Tika. You can upload files
      of pretty much any type up to 1GB in size. See chunking algorithm details
      at `docs.trieve.ai` for more information on how chunking works. Improved
      default chunking is on our roadmap.
  - name: Events
    description: >-
      Notifications endpoint. Files are uploaded asynchronously and events are
      sent to the user when the upload is complete.
  - name: Topic
    description: >-
      Topic chat endpoint. Think of topics as the storage system for gen-ai chat
      memory. Gen AI messages belong to topics.
  - name: Message
    description: >-
      Message chat endpoint. Messages are units belonging to a topic in the
      context of a chat with a LLM. There are system, user, and assistant
      messages.
  - name: Stripe
    description: >-
      Stripe endpoint. Used for the managed SaaS version of this app. Eventually
      this will become a micro-service. Reach out to the team using contact info
      found at `docs.trieve.ai` for more information.
  - name: Health
    description: Health check endpoint. Used to check if the server is up and running.
  - name: Metrics
    description: Metrics endpoint. Used to get information for monitoring
  - name: Analytics
    description: Analytics endpoint. Used to get information for search and RAG analytics
  - name: Experiment
    description: Experiment endpoint. Used to create and manage experiments
paths:
  /api/chunk/generate:
    post:
      tags:
        - Chunk
      summary: RAG on Specified Chunks
      description: >-
        This endpoint exists as an alternative to the topic+message resource
        pattern where our Trieve handles chat memory. With this endpoint, the
        user is responsible for providing the context window and the prompt and
        the conversation is ephemeral.
      operationId: generate_off_chunks
      parameters:
        - name: TR-Dataset
          in: header
          description: >-
            The dataset id or tracking_id to use for the request. We assume you
            intend to use an id if the value is a valid uuid.
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        description: JSON request payload to perform RAG on some chunks (chunks)
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/GenerateOffChunksReqPayload'
        required: true
      responses:
        '200':
          description: >-
            This will be a JSON response of a string containing the LLM's
            generated inference. Response if not streaming.
          headers:
            TR-QueryID:
              schema:
                type: string
                format: uuid
              description: Query ID that is used for tracking analytics
          content:
            text/plain:
              schema:
                type: string
        '400':
          description: >-
            Service error relating to to updating chunk, likely due to
            conflicting tracking_id
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponseBody'
      security:
        - ApiKey:
            - readonly
components:
  schemas:
    GenerateOffChunksReqPayload:
      type: object
      required:
        - prev_messages
        - chunk_ids
      properties:
        audio_input:
          type: string
          description: >-
            Audio input to be used in the chat. This will be used to generate
            the audio tokens for the model. The default is None.
          nullable: true
        chunk_ids:
          type: array
          items:
            type: string
            format: uuid
          description: >-
            The ids of the chunks to be retrieved and injected into the context
            window for RAG.
        context_options:
          allOf:
            - $ref: '#/components/schemas/ContextOptions'
          nullable: true
        frequency_penalty:
          type: number
          format: float
          description: >-
            Frequency penalty is a number between -2.0 and 2.0. Positive values
            penalize new tokens based on their existing frequency in the text so
            far, decreasing the model's likelihood to repeat the same line
            verbatim. Default is 0.7.
          nullable: true
        highlight_results:
          type: boolean
          description: >-
            Set highlight_results to false for a slight latency improvement
            (1-10ms). If not specified, this defaults to true. This will add
            `<mark><b>` tags to the chunk_html of the chunks to highlight
            matching splits.
          nullable: true
        image_config:
          allOf:
            - $ref: '#/components/schemas/ImageConfig'
          nullable: true
        image_urls:
          type: array
          items:
            type: string
          description: >-
            Image URLs to be used in the chat. These will be used to generate
            the image tokens for the model. The default is None.
          nullable: true
        max_tokens:
          type: integer
          format: int32
          description: >-
            The maximum number of tokens to generate in the chat completion.
            Default is None.
          nullable: true
          minimum: 0
        metadata:
          description: >-
            Metadata is any metadata you want to associate w/ the event that is
            created from this request
          nullable: true
        model:
          type: string
          description: >-
            Model to use for the completion. If not specified, the default model
            configured for the dataset will be used.
          nullable: true
        presence_penalty:
          type: number
          format: float
          description: >-
            Presence penalty is a number between -2.0 and 2.0. Positive values
            penalize new tokens based on whether they appear in the text so far,
            increasing the model's likelihood to talk about new topics. Default
            is 0.7.
          nullable: true
        prev_messages:
          type: array
          items:
            $ref: '#/components/schemas/ChatMessageProxy'
          description: >-
            The previous messages to be placed into the chat history. There must
            be at least one previous message.
        prompt:
          type: string
          description: >-
            Prompt will be used to tell the model what to generate in the next
            message in the chat. The default is 'Respond to the previous
            instruction and include the doc numbers that you used in square
            brackets at the end of the sentences that you used the docs for:'.
            You can also specify an empty string to leave the final message
            alone such that your user's final message can be used as the prompt.
            See docs.trieve.ai or contact us for more information.
          nullable: true
        stop_tokens:
          type: array
          items:
            type: string
          description: >-
            Stop tokens are up to 4 sequences where the API will stop generating
            further tokens. Default is None.
          nullable: true
        stream_response:
          type: boolean
          description: >-
            Whether or not to stream the response. If this is set to true or not
            included, the response will be a stream. If this is set to false,
            the response will be a normal JSON response. Default is true.
          nullable: true
        temperature:
          type: number
          format: float
          description: >-
            What sampling temperature to use, between 0 and 2. Higher values
            like 0.8 will make the output more random, while lower values like
            0.2 will make it more focused and deterministic. Default is 0.5.
          nullable: true
        user_id:
          type: string
          description: >-
            User ID is the id of the user who is making the request. This is
            used to track user interactions with the RAG results.
          nullable: true
      example:
        chunk_ids:
          - d290f1ee-6c54-4b01-90e6-d701748f0851
        prev_messages:
          - content: How do I setup RAG with Trieve?
            role: user
        prompt: >-
          Respond to the instruction and include the doc numbers that you used
          in square brackets at the end of the sentences that you used the docs
          for:
        stream_response: true
    ErrorResponseBody:
      type: object
      required:
        - message
      properties:
        message:
          type: string
      example:
        message: Bad Request
    ContextOptions:
      type: object
      description: >-
        Context options to use for the completion. If not specified, all options
        will default to false.
      properties:
        include_links:
          type: boolean
          description: >-
            Include links in the context. If not specified, this defaults to
            false.
          nullable: true
    ImageConfig:
      type: object
      description: Configuration for sending images to the llm
      properties:
        images_per_chunk:
          type: integer
          description: >-
            The number of Images to send to the llm per chunk that is fetched
            more images may slow down llm inference time. default: 5
          nullable: true
          minimum: 0
        use_images:
          type: boolean
          description: >-
            This sends images to the llm if chunk_metadata.image_urls has some
            value, the call will error if the model is not a vision LLM model.
            default: false
          nullable: true
      example:
        images_per_chunk: 1
        use_images: true
    ChatMessageProxy:
      type: object
      required:
        - role
        - content
      properties:
        content:
          type: string
        role:
          $ref: '#/components/schemas/RoleProxy'
      example:
        content: Hello, world!
        role: user
    RoleProxy:
      type: string
      enum:
        - system
        - user
        - assistant
  securitySchemes:
    ApiKey:
      type: apiKey
      in: header
      name: Authorization

````