PUT
/
api
/
crawl
curl --request PUT \
  --url https://api.trieve.ai/api/crawl \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --header 'TR-Dataset: <tr-dataset>' \
  --data '{
  "crawl_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "crawl_options": {
    "crawl_options": {
      "allow_external_links": false,
      "boost_titles": true,
      "exclude_tags": [
        "#ad",
        "#footer",
        "header",
        "head",
        "navbar",
        "footer",
        "aside",
        "nav",
        "form"
      ],
      "heading_remove_strings": [
        "Advertisement",
        "Sponsored"
      ],
      "ignore_sitemap": true,
      "include_tags": [],
      "interval": "daily",
      "limit": 50,
      "site_url": "nedzo.ai"
    }
  }
}'
{
  "attempt_number": 123,
  "crawl_options": {
    "crawl_options": {
      "allow_external_links": false,
      "boost_titles": true,
      "exclude_tags": [
        "#ad",
        "#footer",
        "header",
        "head",
        "navbar",
        "footer",
        "aside",
        "nav",
        "form"
      ],
      "heading_remove_strings": [
        "Advertisement",
        "Sponsored"
      ],
      "ignore_sitemap": true,
      "include_tags": [],
      "interval": "daily",
      "limit": 50,
      "site_url": "nedzo.ai"
    }
  },
  "crawl_type": "firecrawl",
  "created_at": "2023-11-07T05:31:56Z",
  "dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "interval": "<string>",
  "next_crawl_at": "2023-11-07T05:31:56Z",
  "scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "status": "Pending",
  "url": "<string>"
}

Authorizations

Authorization
string
header
required

Headers

TR-Dataset
string
required

The dataset id to use for the request

Body

application/json
JSON request payload to update a crawl
crawl_id
string
required

Crawl ID to update

crawl_options
object
required

Options for setting up the crawl which will populate the dataset.

Example:
{
  "crawl_options": {
    "allow_external_links": false,
    "boost_titles": true,
    "exclude_tags": [
      "#ad",
      "#footer",
      "header",
      "head",
      "navbar",
      "footer",
      "aside",
      "nav",
      "form"
    ],
    "heading_remove_strings": ["Advertisement", "Sponsored"],
    "ignore_sitemap": true,
    "include_tags": [],
    "interval": "daily",
    "limit": 50,
    "site_url": "nedzo.ai"
  }
}

Response

200
application/json
Crawl updated successfully
attempt_number
integer
required
crawl_options
object
required

Options for setting up the crawl which will populate the dataset.

Example:
{
  "crawl_options": {
    "allow_external_links": false,
    "boost_titles": true,
    "exclude_tags": [
      "#ad",
      "#footer",
      "header",
      "head",
      "navbar",
      "footer",
      "aside",
      "nav",
      "form"
    ],
    "heading_remove_strings": ["Advertisement", "Sponsored"],
    "ignore_sitemap": true,
    "include_tags": [],
    "interval": "daily",
    "limit": 50,
    "site_url": "nedzo.ai"
  }
}
crawl_type
enum<string>
required
Available options:
firecrawl,
openapi,
shopify,
youtube
created_at
string
required
dataset_id
string
required
id
string
required
scrape_id
string
required
status
required
Available options:
Pending
url
string
required
interval
string | null
next_crawl_at
string | null