Get all crawl requests for a dataset

curl --request GET \
  --url https://api.trieve.ai/api/crawl \
  --header 'Authorization: <api-key>' \
  --header 'TR-Dataset: <tr-dataset>'

[
  {
    "attempt_number": 123,
    "crawl_options": {
      "crawl_options": {
        "allow_external_links": false,
        "boost_titles": true,
        "exclude_tags": [
          "#ad",
          "#footer",
          "header",
          "head",
          "navbar",
          "footer",
          "aside",
          "nav",
          "form"
        ],
        "heading_remove_strings": [
          "Advertisement",
          "Sponsored"
        ],
        "ignore_sitemap": true,
        "include_tags": [],
        "interval": "daily",
        "limit": 50,
        "site_url": "nedzo.ai"
      }
    },
    "crawl_type": "firecrawl",
    "created_at": "2023-11-07T05:31:56Z",
    "dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "status": "Pending",
    "url": "<string>",
    "interval": "<string>",
    "next_crawl_at": "2023-11-07T05:31:56Z"
  }
]

GET

api

crawl

Get all crawl requests for a dataset

curl --request GET \
  --url https://api.trieve.ai/api/crawl \
  --header 'Authorization: <api-key>' \
  --header 'TR-Dataset: <tr-dataset>'

[
  {
    "attempt_number": 123,
    "crawl_options": {
      "crawl_options": {
        "allow_external_links": false,
        "boost_titles": true,
        "exclude_tags": [
          "#ad",
          "#footer",
          "header",
          "head",
          "navbar",
          "footer",
          "aside",
          "nav",
          "form"
        ],
        "heading_remove_strings": [
          "Advertisement",
          "Sponsored"
        ],
        "ignore_sitemap": true,
        "include_tags": [],
        "interval": "daily",
        "limit": 50,
        "site_url": "nedzo.ai"
      }
    },
    "crawl_type": "firecrawl",
    "created_at": "2023-11-07T05:31:56Z",
    "dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "status": "Pending",
    "url": "<string>",
    "interval": "<string>",
    "next_crawl_at": "2023-11-07T05:31:56Z"
  }
]

Authorizations

Authorization

string

header

required

Headers

TR-Dataset

string<uuid>

required

The dataset id to use for the request

Query Parameters

page

integer<int64> | null

The page number to retrieve

limit

integer<int64> | null

The number of items to retrieve per page

Response

Crawl requests retrieved successfully

attempt_number

integer<int32>

required

crawl_options

object

required

Options for setting up the crawl which will populate the dataset.

Show child attributes

Example:

{
  "crawl_options": {
    "allow_external_links": false,
    "boost_titles": true,
    "exclude_tags": [
      "#ad",
      "#footer",
      "header",
      "head",
      "navbar",
      "footer",
      "aside",
      "nav",
      "form"
    ],
    "heading_remove_strings": ["Advertisement", "Sponsored"],
    "ignore_sitemap": true,
    "include_tags": [],
    "interval": "daily",
    "limit": 50,
    "site_url": "nedzo.ai"
  }
}

crawl_type

enum<string>

required

Available options:

firecrawl,

openapi,

shopify,

youtube

created_at

string<date-time>

required

dataset_id

string<uuid>

required

string<uuid>

required

scrape_id

string<uuid>

required

status

required

Available options:

Pending

url

string

required

interval

string | null

next_crawl_at

string<date-time> | null

Transcribe Audio Update a crawl request

⌘I

Chunk

Chunk Group

Topic

Message

Crawl

File

Analytics

Experiments

Dataset

Organization

User

Auth

Health

Invitation

Stripe

Metrics

Public

Get all crawl requests for a dataset

Authorizations

Headers

Query Parameters

Response