> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trieve.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Clone Dataset

> Clones a dataset and creates a new dataset with the same configuration and chunks. The auth'ed user must be an owner of the organization to clone a dataset.



## OpenAPI

````yaml post /api/dataset/clone
openapi: 3.0.3
info:
  title: Trieve API
  description: >-
    Trieve OpenAPI Specification. This document describes all of the operations
    available through the Trieve API.
  contact:
    name: Trieve Team
    url: https://trieve.ai
    email: developers@trieve.ai
  license:
    name: BSL
    url: https://github.com/devflowinc/trieve/blob/main/LICENSE.txt
  version: 0.13.0
servers:
  - url: https://api.trieve.ai
    description: Production server
  - url: http://localhost:8090
    description: Local development server
security: []
tags:
  - name: Invitation
    description: Invitation endpoint. Exists to invite users to an organization.
  - name: Auth
    description: Authentication endpoint. Serves to register and authenticate users.
  - name: User
    description: User endpoint. Enables you to modify user roles and information.
  - name: Organization
    description: >-
      Organization endpoint. Enables you to modify organization roles and
      information.
  - name: Dataset
    description: >-
      Dataset endpoint. Datasets belong to organizations and hold configuration
      information for both client and server. Datasets contain chunks and chunk
      groups.
  - name: Chunk
    description: >-
      Chunk endpoint. Think of chunks as individual searchable units of
      information. The majority of your integration will likely be with the
      Chunk endpoint.
  - name: Chunk Group
    description: >-
      Chunk groups endpoint. Think of a chunk_group as a bookmark folder within
      the dataset.
  - name: Crawl
    description: Crawl endpoint. Used to create and manage crawls for datasets.
  - name: File
    description: >-
      File endpoint. When files are uploaded, they are stored in S3 and broken
      up into chunks with text extraction from Apache Tika. You can upload files
      of pretty much any type up to 1GB in size. See chunking algorithm details
      at `docs.trieve.ai` for more information on how chunking works. Improved
      default chunking is on our roadmap.
  - name: Events
    description: >-
      Notifications endpoint. Files are uploaded asynchronously and events are
      sent to the user when the upload is complete.
  - name: Topic
    description: >-
      Topic chat endpoint. Think of topics as the storage system for gen-ai chat
      memory. Gen AI messages belong to topics.
  - name: Message
    description: >-
      Message chat endpoint. Messages are units belonging to a topic in the
      context of a chat with a LLM. There are system, user, and assistant
      messages.
  - name: Stripe
    description: >-
      Stripe endpoint. Used for the managed SaaS version of this app. Eventually
      this will become a micro-service. Reach out to the team using contact info
      found at `docs.trieve.ai` for more information.
  - name: Health
    description: Health check endpoint. Used to check if the server is up and running.
  - name: Metrics
    description: Metrics endpoint. Used to get information for monitoring
  - name: Analytics
    description: Analytics endpoint. Used to get information for search and RAG analytics
  - name: Experiment
    description: Experiment endpoint. Used to create and manage experiments
paths:
  /api/dataset/clone:
    post:
      tags:
        - Dataset
      summary: Clone Dataset
      description: >-
        Clones a dataset and creates a new dataset with the same configuration
        and chunks. The auth'ed user must be an owner of the organization to
        clone a dataset.
      operationId: clone_dataset
      parameters:
        - name: TR-Organization
          in: header
          description: The organization id to use for the request
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        description: JSON request payload to clone a dataset
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CloneDatasetRequest'
        required: true
      responses:
        '200':
          description: Dataset cloned successfully
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Dataset'
        '400':
          description: Service error relating to cloning the dataset
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponseBody'
        '404':
          description: Dataset not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponseBody'
components:
  schemas:
    CloneDatasetRequest:
      type: object
      required:
        - dataset_to_clone
        - dataset_name
      properties:
        clone_chunks:
          type: boolean
          description: >-
            Parameter to Clone Chunks from the original dataset to the new
            dataset. defaults to true.
          nullable: true
        dataset_name:
          type: string
          description: Name of the dataset.
        dataset_to_clone:
          type: string
          format: uuid
          description: The id of the dataset you want to clone.
        tracking_id:
          type: string
          description: >-
            Optional tracking ID for the dataset. Can be used to track the
            dataset in external systems. Must be unique within the organization.
            Strongly recommended to not use a valid uuid value as that will not
            work with the TR-Dataset header.
          nullable: true
      example:
        dataset_name: My Dataset
        dataset_to_clone: 00000000-0000-0000-0000-000000000000
        organization_id: 00000000-0000-0000-0000-000000000000
        server_configuration:
          AIMON_RERANKER_TASK_DEFINITION: >-
            Your task is to grade the relevance of context document(s) against
            the specified user query.
          BM25_AVG_LEN: 256
          BM25_B: 0.75
          BM25_ENABLED: true
          BM25_K: 0.75
          DISTANCE_METRIC: cosine
          EMBEDDING_BASE_URL: https://api.openai.com/v1
          EMBEDDING_MODEL_NAME: text-embedding-3-small
          EMBEDDING_QUERY_PREFIX: ''
          EMBEDDING_SIZE: 1536
          FREQUENCY_PENALTY: 0
          FULLTEXT_ENABLED: true
          INDEXED_ONLY: false
          LLM_BASE_URL: https://api.openai.com/v1
          LLM_DEFAULT_MODEL: gpt-3.5-turbo-1106
          LOCKED: false
          MAX_LIMIT: 10000
          MESSAGE_TO_QUERY_PROMPT: >+
            Write a 1-2 sentence semantic search query along the lines of a
            hypothetical response to: 

          N_RETRIEVALS_TO_INCLUDE: 8
          PRESENCE_PENALTY: 0
          QDRANT_ONLY: false
          RAG_PROMPT: >-
            Use the following retrieved documents to respond briefly and
            accurately:
          SEMANTIC_ENABLED: true
          STOP_TOKENS:
            - |+


            - |+

          SYSTEM_PROMPT: You are a helpful assistant
          TEMPERATURE: 0.5
          USE_MESSAGE_TO_QUERY_PROMPT: false
    Dataset:
      type: object
      required:
        - id
        - name
        - created_at
        - updated_at
        - organization_id
        - server_configuration
        - deleted
      properties:
        created_at:
          type: string
          format: date-time
          description: Timestamp of the creation of the dataset
        deleted:
          type: integer
          format: int32
          description: >-
            Flag to indicate if the dataset has been deleted. Deletes are
            handled async after the flag is set so as to avoid expensive search
            index compaction.
        id:
          type: string
          format: uuid
          description: >-
            Unique identifier of the dataset, auto-generated uuid created by
            Trieve
        name:
          type: string
          description: Name of the dataset
        organization_id:
          type: string
          format: uuid
          description: Unique identifier of the organization that owns the dataset
        server_configuration:
          description: Configuration of the dataset for RAG, embeddings, BM25, etc.
        tracking_id:
          type: string
          description: >-
            Tracking ID of the dataset, can be any string, determined by the
            user. Tracking ID's are unique identifiers for datasets within an
            organization. They are designed to match the unique identifier of
            the dataset in the user's system.
          nullable: true
        updated_at:
          type: string
          format: date-time
          description: Timestamp of the last update of the dataset
      example:
        created_at: '2021-01-01 00:00:00.000'
        id: e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3
        name: Trieve
        organization_id: e3e3e3e3-e3e3-e3e3-e3e3-e3e3e3e3e3e3
        server_configuration:
          AIMON_RERANKER_TASK_DEFINITION: >-
            Your task is to grade the relevance of context document(s) against
            the specified user query.
          BM25_AVG_LEN: 256
          BM25_B: 0.75
          BM25_ENABLED: true
          BM25_K: 0.75
          DISTANCE_METRIC: cosine
          EMBEDDING_BASE_URL: https://embedding.trieve.ai
          EMBEDDING_MODEL_NAME: jina-base-en
          EMBEDDING_QUERY_PREFIX: ''
          EMBEDDING_SIZE: 768
          FREQUENCY_PENALTY: 0
          FULLTEXT_ENABLED: true
          INDEXED_ONLY: false
          LLM_BASE_URL: https://api.openai.com/v1
          LLM_DEFAULT_MODEL: gpt-4o
          LOCKED: false
          MAX_LIMIT: 10000
          MESSAGE_TO_QUERY_PROMPT: >+
            Write a 1-2 sentence semantic search query along the lines of a
            hypothetical response to: 

          N_RETRIEVALS_TO_INCLUDE: 8
          PRESENCE_PENALTY: 0
          QDRANT_ONLY: false
          RAG_PROMPT: >-
            Use the following retrieved documents to respond briefly and
            accurately:
          SEMANTIC_ENABLED: true
          STOP_TOKENS:
            - |+


            - |+

          SYSTEM_PROMPT: You are a helpful assistant
          TEMPERATURE: 0.5
          USE_MESSAGE_TO_QUERY_PROMPT: false
        tracking_id: foobar-dataset
        updated_at: '2021-01-01 00:00:00.000'
    ErrorResponseBody:
      type: object
      required:
        - message
      properties:
        message:
          type: string
      example:
        message: Bad Request

````