GET
/
api
/
crawl
curl --request GET \
  --url https://api.trieve.ai/api/crawl \
  --header 'Authorization: <api-key>' \
  --header 'TR-Dataset: <tr-dataset>'
[
  {
    "attempt_number": 123,
    "crawl_options": {
      "body_remove_strings": [
        "Edit on github"
      ],
      "boost_titles": true,
      "exclude_paths": [
        "https://example.com/exclude*"
      ],
      "exclude_tags": [
        "#ad",
        "#footer"
      ],
      "heading_remove_strings": [
        "Advertisement",
        "Sponsored"
      ],
      "include_paths": [
        "https://example.com/include*"
      ],
      "include_tags": [
        "h1",
        "p",
        "a",
        ".main-content"
      ],
      "interval": "daily",
      "limit": 1000,
      "site_url": "https://example.com"
    },
    "crawl_type": "firecrawl",
    "created_at": "2023-11-07T05:31:56Z",
    "dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "interval": "<string>",
    "next_crawl_at": "2023-11-07T05:31:56Z",
    "scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "status": "Pending",
    "url": "<string>"
  }
]

Authorizations

Authorization
string
header
required

Headers

TR-Dataset
string
required

The dataset id to use for the request

Query Parameters

page
integer | null

The page number to retrieve

limit
integer | null

The number of items to retrieve per page

Response

200
application/json
Crawl requests retrieved successfully
attempt_number
integer
required
crawl_options
object
required

Options for setting up the crawl which will populate the dataset.

crawl_type
enum<string>
required
Available options:
firecrawl,
openapi,
shopify,
youtube
created_at
string
required
dataset_id
string
required
id
string
required
interval
string
required
next_crawl_at
string
required
scrape_id
string
required
status
required
Available options:
Pending
url
string
required