> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trieve.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Presigned CSV/JSONL S3 PUT URL

> This route is useful for uploading very large CSV or JSONL files. Once you have completed the upload, chunks will be automatically created from the file for each line in the CSV or JSONL file. The chunks will be indexed and searchable. Auth'ed user must be an admin or owner of the dataset's organization to upload a file.



## OpenAPI

````yaml post /api/file/csv_or_jsonl
openapi: 3.0.3
info:
  title: Trieve API
  description: >-
    Trieve OpenAPI Specification. This document describes all of the operations
    available through the Trieve API.
  contact:
    name: Trieve Team
    url: https://trieve.ai
    email: developers@trieve.ai
  license:
    name: BSL
    url: https://github.com/devflowinc/trieve/blob/main/LICENSE.txt
  version: 0.13.0
servers:
  - url: https://api.trieve.ai
    description: Production server
  - url: http://localhost:8090
    description: Local development server
security: []
tags:
  - name: Invitation
    description: Invitation endpoint. Exists to invite users to an organization.
  - name: Auth
    description: Authentication endpoint. Serves to register and authenticate users.
  - name: User
    description: User endpoint. Enables you to modify user roles and information.
  - name: Organization
    description: >-
      Organization endpoint. Enables you to modify organization roles and
      information.
  - name: Dataset
    description: >-
      Dataset endpoint. Datasets belong to organizations and hold configuration
      information for both client and server. Datasets contain chunks and chunk
      groups.
  - name: Chunk
    description: >-
      Chunk endpoint. Think of chunks as individual searchable units of
      information. The majority of your integration will likely be with the
      Chunk endpoint.
  - name: Chunk Group
    description: >-
      Chunk groups endpoint. Think of a chunk_group as a bookmark folder within
      the dataset.
  - name: Crawl
    description: Crawl endpoint. Used to create and manage crawls for datasets.
  - name: File
    description: >-
      File endpoint. When files are uploaded, they are stored in S3 and broken
      up into chunks with text extraction from Apache Tika. You can upload files
      of pretty much any type up to 1GB in size. See chunking algorithm details
      at `docs.trieve.ai` for more information on how chunking works. Improved
      default chunking is on our roadmap.
  - name: Events
    description: >-
      Notifications endpoint. Files are uploaded asynchronously and events are
      sent to the user when the upload is complete.
  - name: Topic
    description: >-
      Topic chat endpoint. Think of topics as the storage system for gen-ai chat
      memory. Gen AI messages belong to topics.
  - name: Message
    description: >-
      Message chat endpoint. Messages are units belonging to a topic in the
      context of a chat with a LLM. There are system, user, and assistant
      messages.
  - name: Stripe
    description: >-
      Stripe endpoint. Used for the managed SaaS version of this app. Eventually
      this will become a micro-service. Reach out to the team using contact info
      found at `docs.trieve.ai` for more information.
  - name: Health
    description: Health check endpoint. Used to check if the server is up and running.
  - name: Metrics
    description: Metrics endpoint. Used to get information for monitoring
  - name: Analytics
    description: Analytics endpoint. Used to get information for search and RAG analytics
  - name: Experiment
    description: Experiment endpoint. Used to create and manage experiments
paths:
  /api/file/csv_or_jsonl:
    post:
      tags:
        - File
      summary: Create Presigned CSV/JSONL S3 PUT URL
      description: >-
        This route is useful for uploading very large CSV or JSONL files. Once
        you have completed the upload, chunks will be automatically created from
        the file for each line in the CSV or JSONL file. The chunks will be
        indexed and searchable. Auth'ed user must be an admin or owner of the
        dataset's organization to upload a file.
      operationId: create_presigned_url_for_csv_jsonl
      parameters:
        - name: TR-Dataset
          in: header
          description: >-
            The dataset id or tracking_id to use for the request. We assume you
            intend to use an id if the value is a valid uuid.
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        description: JSON request payload to upload a CSV or JSONL file
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreatePresignedUrlForCsvJsonlReqPayload'
        required: true
      responses:
        '200':
          description: File object information and signed put URL
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/CreatePresignedUrlForCsvJsonResponseBody'
        '400':
          description: Service error relating to uploading the file
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponseBody'
      security:
        - ApiKey:
            - admin
components:
  schemas:
    CreatePresignedUrlForCsvJsonlReqPayload:
      type: object
      required:
        - file_name
      properties:
        description:
          type: string
          description: >-
            Description is an optional convience field so you do not have to
            remember what the file contains or is about. It will be included on
            the group resulting from the file which will hold its chunk.
          nullable: true
        file_name:
          type: string
          description: >-
            Name of the file being uploaded, including the extension. Will be
            used to determine CSV or JSONL for processing.
        fulltext_boost_factor:
          type: number
          format: double
          description: >-
            Amount to multiplicatevly increase the frequency of the tokens in
            the boost phrase for each row's chunk by. Applies to fulltext
            (SPLADE) and keyword (BM25) search.
          nullable: true
        group_tracking_id:
          type: string
          description: >-
            Group tracking id is an optional field which allows you to specify
            the tracking id of the group that is created from the file. Chunks
            created will be created with the tracking id of
            `group_tracking_id|<index of chunk>`
          nullable: true
        link:
          type: string
          description: >-
            Link to the file. This can also be any string. This can be used to
            filter when searching for the file's resulting chunks. The link
            value will not affect embedding creation.
          nullable: true
        mappings:
          allOf:
            - $ref: '#/components/schemas/ChunkReqPayloadMappings'
          nullable: true
        metadata:
          description: >-
            Metadata is a JSON object which can be used to filter chunks. This
            is useful for when you want to filter chunks by arbitrary metadata.
            Unlike with tag filtering, there is a performance hit for filtering
            on metadata. Will be passed down to the file's chunks.
          nullable: true
        semantic_boost_factor:
          type: number
          format: double
          description: >-
            Arbitrary float (positive or negative) specifying the multiplicate
            factor to apply before summing the phrase vector with the chunk_html
            embedding vector. Applies to semantic (embedding model) search.
          nullable: true
        tag_set:
          type: array
          items:
            type: string
          description: >-
            Tag set is a comma separated list of tags which will be passed down
            to the chunks made from the file. Each tag will be joined with
            what's creatd per row of the CSV or JSONL file.
          nullable: true
        time_stamp:
          type: string
          description: >-
            Time stamp should be an ISO 8601 combined date and time without
            timezone. Time_stamp is used for time window filtering and
            recency-biasing search results. Will be passed down to the file's
            chunks.
          nullable: true
        upsert_by_tracking_id:
          type: boolean
          description: >-
            Upsert by tracking_id. If true, chunks will be upserted by
            tracking_id. If false, chunks with the same tracking_id as another
            already existing chunk will be ignored. Defaults to true.
          nullable: true
      example:
        description: This is an example file
        file_name: example.pdf
        link: https://example.com
        metadata:
          key1: value1
          key2: value2
        tag_set:
          - tag1
          - tag2
        time_stamp: '2021-01-01 00:00:00.000Z'
    CreatePresignedUrlForCsvJsonResponseBody:
      type: object
      required:
        - file_metadata
        - presigned_put_url
      properties:
        file_metadata:
          $ref: '#/components/schemas/File'
        presigned_put_url:
          type: string
          description: Signed URL to upload the file to.
    ErrorResponseBody:
      type: object
      required:
        - message
      properties:
        message:
          type: string
      example:
        message: Bad Request
    ChunkReqPayloadMappings:
      type: array
      items:
        $ref: '#/components/schemas/ChunkReqPayloadMapping'
      description: >-
        Specify all of the mappings between columns or fields in a CSV or JSONL
        file and keys in the ChunkReqPayload. Array fields like tag_set,
        image_urls, and group_tracking_ids can have multiple mappings. Boost
        phrase can also have multiple mappings which get concatenated. Other
        fields can only have one mapping and only the last mapping will be used.
    File:
      type: object
      required:
        - id
        - file_name
        - created_at
        - updated_at
        - dataset_id
        - size
      properties:
        created_at:
          type: string
          format: date-time
        dataset_id:
          type: string
          format: uuid
        file_name:
          type: string
        id:
          type: string
          format: uuid
        link:
          type: string
          nullable: true
        metadata:
          nullable: true
        size:
          type: integer
          format: int64
        tag_set:
          type: array
          items:
            type: string
            nullable: true
          nullable: true
        time_stamp:
          type: string
          format: date-time
          nullable: true
        updated_at:
          type: string
          format: date-time
      example:
        created_at: '2021-01-01 00:00:00.000'
        dataset_id: e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3
        file_name: file.txt
        id: e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3
        link: https://trieve.ai
        metadata:
          key: value
        size: 1000
        tag_set: tag1,tag2
        time_stamp: '2021-01-01 00:00:00.000'
        updated_at: '2021-01-01 00:00:00.000'
    ChunkReqPayloadMapping:
      type: object
      description: >-
        Express a mapping between a column or field in a CSV or JSONL field and
        a key in the ChunkReqPayload created for each row or object.
      required:
        - csv_jsonl_field
        - chunk_req_payload_field
      properties:
        chunk_req_payload_field:
          $ref: '#/components/schemas/ChunkReqPayloadFields'
        csv_jsonl_field:
          type: string
          description: >-
            The column or field in the CSV or JSONL file that you want to map to
            a key in the ChunkReqPayload
    ChunkReqPayloadFields:
      type: string
      description: >-
        The key in the ChunkReqPayload which you can map a column or field from
        the CSV or JSONL file to.
      enum:
        - link
        - tag_set
        - num_value
        - tracking_id
        - group_tracking_ids
        - time_stamp
        - lat
        - lon
        - image_urls
        - weight
        - boost_phrase
  securitySchemes:
    ApiKey:
      type: apiKey
      in: header
      name: Authorization

````