This endpoint is used to create a new crawl request for a dataset. The request payload should contain the crawl options to use for the crawl.
curl --request POST \
--url https://api.trieve.ai/api/crawl \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Dataset: <tr-dataset>' \
--data '{
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
}
}'
{
"attempt_number": 123,
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
},
"crawl_type": "firecrawl",
"created_at": "2023-11-07T05:31:56Z",
"dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"interval": "<string>",
"next_crawl_at": "2023-11-07T05:31:56Z",
"scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"status": "Pending",
"url": "<string>"
}
The dataset id to use for the request
JSON request payload to create a new crawl
The body is of type object
.
Crawl created successfully
The response is of type object
.
Was this page helpful?
curl --request POST \
--url https://api.trieve.ai/api/crawl \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Dataset: <tr-dataset>' \
--data '{
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
}
}'
{
"attempt_number": 123,
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
},
"crawl_type": "firecrawl",
"created_at": "2023-11-07T05:31:56Z",
"dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"interval": "<string>",
"next_crawl_at": "2023-11-07T05:31:56Z",
"scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"status": "Pending",
"url": "<string>"
}
This endpoint is used to create a new crawl request for a dataset. The request payload should contain the crawl options to use for the crawl.
curl --request POST \
--url https://api.trieve.ai/api/crawl \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Dataset: <tr-dataset>' \
--data '{
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
}
}'
{
"attempt_number": 123,
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
},
"crawl_type": "firecrawl",
"created_at": "2023-11-07T05:31:56Z",
"dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"interval": "<string>",
"next_crawl_at": "2023-11-07T05:31:56Z",
"scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"status": "Pending",
"url": "<string>"
}
The dataset id to use for the request
JSON request payload to create a new crawl
The body is of type object
.
Crawl created successfully
The response is of type object
.
Was this page helpful?
curl --request POST \
--url https://api.trieve.ai/api/crawl \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Dataset: <tr-dataset>' \
--data '{
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
}
}'
{
"attempt_number": 123,
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
},
"crawl_type": "firecrawl",
"created_at": "2023-11-07T05:31:56Z",
"dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"interval": "<string>",
"next_crawl_at": "2023-11-07T05:31:56Z",
"scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"status": "Pending",
"url": "<string>"
}