> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trieve.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload HTML Page

> Chunk HTML by headings and queue for indexing into the specified dataset.



## OpenAPI

````yaml post /api/file/html_page
openapi: 3.0.3
info:
  title: Trieve API
  description: >-
    Trieve OpenAPI Specification. This document describes all of the operations
    available through the Trieve API.
  contact:
    name: Trieve Team
    url: https://trieve.ai
    email: developers@trieve.ai
  license:
    name: BSL
    url: https://github.com/devflowinc/trieve/blob/main/LICENSE.txt
  version: 0.13.0
servers:
  - url: https://api.trieve.ai
    description: Production server
  - url: http://localhost:8090
    description: Local development server
security: []
tags:
  - name: Invitation
    description: Invitation endpoint. Exists to invite users to an organization.
  - name: Auth
    description: Authentication endpoint. Serves to register and authenticate users.
  - name: User
    description: User endpoint. Enables you to modify user roles and information.
  - name: Organization
    description: >-
      Organization endpoint. Enables you to modify organization roles and
      information.
  - name: Dataset
    description: >-
      Dataset endpoint. Datasets belong to organizations and hold configuration
      information for both client and server. Datasets contain chunks and chunk
      groups.
  - name: Chunk
    description: >-
      Chunk endpoint. Think of chunks as individual searchable units of
      information. The majority of your integration will likely be with the
      Chunk endpoint.
  - name: Chunk Group
    description: >-
      Chunk groups endpoint. Think of a chunk_group as a bookmark folder within
      the dataset.
  - name: Crawl
    description: Crawl endpoint. Used to create and manage crawls for datasets.
  - name: File
    description: >-
      File endpoint. When files are uploaded, they are stored in S3 and broken
      up into chunks with text extraction from Apache Tika. You can upload files
      of pretty much any type up to 1GB in size. See chunking algorithm details
      at `docs.trieve.ai` for more information on how chunking works. Improved
      default chunking is on our roadmap.
  - name: Events
    description: >-
      Notifications endpoint. Files are uploaded asynchronously and events are
      sent to the user when the upload is complete.
  - name: Topic
    description: >-
      Topic chat endpoint. Think of topics as the storage system for gen-ai chat
      memory. Gen AI messages belong to topics.
  - name: Message
    description: >-
      Message chat endpoint. Messages are units belonging to a topic in the
      context of a chat with a LLM. There are system, user, and assistant
      messages.
  - name: Stripe
    description: >-
      Stripe endpoint. Used for the managed SaaS version of this app. Eventually
      this will become a micro-service. Reach out to the team using contact info
      found at `docs.trieve.ai` for more information.
  - name: Health
    description: Health check endpoint. Used to check if the server is up and running.
  - name: Metrics
    description: Metrics endpoint. Used to get information for monitoring
  - name: Analytics
    description: Analytics endpoint. Used to get information for search and RAG analytics
  - name: Experiment
    description: Experiment endpoint. Used to create and manage experiments
paths:
  /api/file/html_page:
    post:
      tags:
        - File
      summary: Upload HTML Page
      description: >-
        Chunk HTML by headings and queue for indexing into the specified
        dataset.
      operationId: upload_html_page
      requestBody:
        description: JSON request payload to upload a file
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/UploadHtmlPageReqPayload'
        required: true
      responses:
        '204':
          description: Confirmation that html is being processed
        '400':
          description: Service error relating to processing the file
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponseBody'
components:
  schemas:
    UploadHtmlPageReqPayload:
      type: object
      required:
        - data
        - metadata
        - scrapeId
      properties:
        data:
          $ref: '#/components/schemas/Document'
        metadata: {}
        scrapeId:
          type: string
          format: uuid
    ErrorResponseBody:
      type: object
      required:
        - message
      properties:
        message:
          type: string
      example:
        message: Bad Request
    Document:
      type: object
      required:
        - metadata
      properties:
        extract:
          type: string
          nullable: true
        html:
          type: string
          nullable: true
        links:
          type: array
          items:
            type: string
          nullable: true
        markdown:
          type: string
          nullable: true
        metadata:
          $ref: '#/components/schemas/Metadata'
        rawHtml:
          type: string
          nullable: true
        screenshot:
          type: string
          nullable: true
    Metadata:
      type: object
      properties:
        articleSection:
          type: string
          nullable: true
        articleTag:
          type: string
          nullable: true
        dcDate:
          type: string
          nullable: true
        dcDateCreated:
          type: string
          nullable: true
        dcDescription:
          type: string
          nullable: true
        dcSubject:
          type: string
          nullable: true
        dcTermsAudience:
          type: string
          nullable: true
        dcTermsCreated:
          type: string
          nullable: true
        dcTermsKeywords:
          type: string
          nullable: true
        dcTermsSubject:
          type: string
          nullable: true
        dcTermsType:
          type: string
          nullable: true
        dcType:
          type: string
          nullable: true
        description:
          type: string
          nullable: true
        error:
          type: string
          nullable: true
        keywords:
          type: string
          nullable: true
        language:
          type: string
          nullable: true
        modifiedTime:
          type: string
          nullable: true
        ogAudio:
          type: string
          nullable: true
        ogDescription:
          type: string
          nullable: true
        ogDeterminer:
          type: string
          nullable: true
        ogImage:
          type: string
          nullable: true
        ogLocale:
          type: string
          nullable: true
        ogLocaleAlternate:
          type: array
          items:
            type: string
          nullable: true
        ogSiteName:
          type: string
          nullable: true
        ogTitle:
          type: string
          nullable: true
        ogUrl:
          type: string
          nullable: true
        ogVideo:
          type: string
          nullable: true
        publishedTime:
          type: string
          nullable: true
        robots:
          type: string
          nullable: true
        site_map:
          allOf:
            - $ref: '#/components/schemas/Sitemap'
          nullable: true
        sourceURL:
          type: string
          nullable: true
        statusCode:
          type: integer
          format: int32
          nullable: true
          minimum: 0
        title:
          type: string
          nullable: true
    Sitemap:
      type: object
      required:
        - changefreq
      properties:
        changefreq:
          type: string

````