Crawl
Create a new crawl request
Chunk
- POSTCreate or Upsert Chunk or Chunks
- POSTSearch
- POSTAutocomplete
- POSTGet Recommended Chunks
- POSTScroll Chunks
- POSTCount chunks above threshold
- POSTGenerate suggested queries
- POSTRAG on Specified Chunks
- PUTUpdate Chunk
- PUTUpdate Chunk By Tracking Id
- GETGet Chunk By Id
- GETGet Chunk By Tracking Id
- POSTGet Chunks By Tracking Ids
- POSTGet Chunks By Ids
- DELDelete Chunk
- DELDelete Chunk By Tracking Id
- DELBulk Delete Chunks
- POSTSplit HTML Content into Chunks
Chunk Group
- POSTCreate or Upsert Group or Groups
- POSTSearch Over Groups
- POSTSearch Within Group
- POSTGet Recommended Groups
- POSTAdd Chunk to Group
- POSTAdd Chunk to Group by Tracking ID
- POSTGet Groups for Chunks
- GETGet Chunks in Group by Tracking ID
- GETGet Group by Tracking ID
- PUTUpdate Group
- DELRemove Chunk from Group
- DELDelete Group by Tracking ID
- DELDelete Group
- GETGet Group
- GETGet Chunks in Group
- GETGet Groups for Dataset
Message
Crawl
File
Analytics
Dataset
- POSTCreate Dataset
- POSTBatch Create Datasets
- POSTGet All Tags
- POSTGet events for the dataset
- PUTUpdate Dataset by ID or Tracking ID
- PUTClear Dataset
- GETGet Dataset By ID
- GETGet Dataset by Tracking ID
- GETGet Datasets from Organization
- POSTCreate ETL Job
- PUTCreate Pagefind Index for Dataset
- GETGet Pagefind Index Url for Dataset
- GETGet Usage By Dataset ID
- GETGet dataset crawl options
- GETGet apipublic page
- DELDelete Dataset
- DELDelete Dataset by Tracking ID
Organization
Health
Stripe
Metrics
Crawl
Create a new crawl request
This endpoint is used to create a new crawl request for a dataset. The request payload should contain the crawl options to use for the crawl.
POST
/
api
/
crawl
curl --request POST \
--url https://api.trieve.ai/api/crawl \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Dataset: <tr-dataset>' \
--data '{
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
}
}'
{
"attempt_number": 123,
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
},
"crawl_type": "firecrawl",
"created_at": "2023-11-07T05:31:56Z",
"dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"interval": "<string>",
"next_crawl_at": "2023-11-07T05:31:56Z",
"scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"status": "Pending",
"url": "<string>"
}
Authorizations
Headers
The dataset id to use for the request
Body
application/json
JSON request payload to create a new crawl
The body is of type object
.
Response
200
application/json
Crawl created successfully
The response is of type object
.
Was this page helpful?
curl --request POST \
--url https://api.trieve.ai/api/crawl \
--header 'Authorization: <api-key>' \
--header 'Content-Type: application/json' \
--header 'TR-Dataset: <tr-dataset>' \
--data '{
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
}
}'
{
"attempt_number": 123,
"crawl_options": {
"crawl_options": {
"allow_external_links": false,
"boost_titles": true,
"exclude_tags": [
"#ad",
"#footer",
"header",
"head",
"navbar",
"footer",
"aside",
"nav",
"form"
],
"heading_remove_strings": [
"Advertisement",
"Sponsored"
],
"ignore_sitemap": true,
"include_tags": [],
"interval": "daily",
"limit": 50,
"site_url": "nedzo.ai"
}
},
"crawl_type": "firecrawl",
"created_at": "2023-11-07T05:31:56Z",
"dataset_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"interval": "<string>",
"next_crawl_at": "2023-11-07T05:31:56Z",
"scrape_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"status": "Pending",
"url": "<string>"
}