Quickstart Guide

This guide is going to step you through the current dashboard and give python example code using our sdk to upload your first few chunks into a Dataset.

The steps are laid out as follows:

  1. Create Account / Organization.
  2. Create dataset.
  3. Create an api key
  4. Create your first chunk
  5. Search and all your options
  6. Links

1. Create Account / Organization

Head over to https://dashboard.trieve.ai and create an account. After you're logged in you will see the dashboard homepage. Go ahead and create a new Dataset Dashboard Homepage

2. Create Dataset

Create Dataset menu

At the time of this article we support 2 types of embedding models, we self-host jina-base-en in house and also allow for openai embeddings (currently using ada-002). The api is more flexible we just need to add the ui for better customization of embedding model source. Once you choose a model it is not able to be changed.

Dataset and Organization view

Take note of the dataset and organization id's. It will be useful for when uploading chunks in the next step.

3. Create API key

In Organization Settings, there should be an option to generate a new token. Create a read + write api token. Save this also for the next step. The api keys are put at userscope so you can have multiple keys for different purposes. And each user that is added into the organization can have different keys.

Organization Settings, create api key

4. Create your first chunk(s)

pip install git+https://github.com/devflowinc/trieve_client_py

# just for the example of loading API_KEY and DATASET_ID
pip install dotenv

To create chunks using the python sdk refer to this code snippet. we support arbitrary metadata to be uploaded via the metadata field and support for external unique id's via the tracking_id field. In this example we simply upload 10 chunks.

from trieve_client import AuthenticatedClient
from trieve_client.api.chunk import create_chunk
from trieve_client.models import CreateChunkData, ReturnCreatedChunk
import os
from trieve_client.models.error_response_body import ErrorResponseBody

## Load environment variables
api_key = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"
organization_id = "YOUR_ORGANIZATION_ID"

# Here we create a client with the api key, dataset id, and organization id
client = AuthenticatedClient(base_url="https://api.trieve.ai",
	prefix="",
	token=api_key
).with_headers({
	"TR-Dataset": dataset_id,
	"TR-Organization": organization_id,
});

with client as client:
	for i in range(10):
		id = f"example-chunk-{id}"
		chunk = CreateChunkData(
			# We accept html inputs, the html tags are NOT embedded.
			# HTML does help for us to highlight results in the response.
			chunk_html=f"<h1>Hello, World! chunk {id}</h1>",
			# If you have a link that relates to your chunk
			link=f"https://{id}.com",
			# Add as many tags as needed
			tag_set=["example", "test", id],
			# Since we queue writes, tracking id helps to prevent duplicates
			# This can be any internal id system that you have or simply left
			# blank
			tracking_id=id,
			# You can put a timestamp as when it was made, this helps with the
			# timerange filter that we have
			time_stamp="2021-01-01T00:00:00Z",
			# You can place a list of group ids that the chunk will automatically
			# be added to. Chunks can also be added to groups after.
			group_ids=[],
			# Similar thing with GroupID just less flexible, adding this chunk to a fileID
			file_id=None,
			# Chunk Vector is an alternative to chunk_html.
			# Chunk Vector may be used if you already embedded the data
			chunk_vector=None,
			weight=None,
			# We allow for arbitrary metadata
			metadata={
				"anykey": "any-data",
				"id": id,
			},
		)

		# Send the request to trieve
		chunk_response = create_chunk.sync(
			tr_dataset=dataset_id,
			client=client,
			body=chunk
		)

		# All responses are typed, so you can check the type of responses
		# Errors are of type ErrorResponseBody and contain a message.
		if type(chunk_response) == ReturnCreatedChunk:
			print(f"queue'd pos: {chunk_response.pos_in_queue}")
		elif type(chunk_response) == ErrorResponseBody:
			print(f"Failed {chunk_response.message}")
			exit(1)

This is the syntax to search for a chunk. We support 3 different types of searches, semantic, hybrid, and fulltext.

from trieve_client.api.chunk import search_chunk
from trieve_client.models import SearchChunkData, SearchChunkQueryResponseBody
from trieve_client.models.error_response_body import ErrorResponseBody

## Load environment variables
api_key = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"
organization_id = "YOUR_ORGANIZATION_ID"

# Here we create a client with the api key, dataset id, and organization id
client = AuthenticatedClient(base_url="https://api.trieve.ai",
	prefix="",
	token=api_key
).with_headers({
	"TR-Dataset": dataset_id,
	"TR-Organization": organization_id,
});


with client as client:
	# Conduct an example search
	search_data = SearchChunkData(
		# The query that you want to search for
		query="example",
		# The type of search that you want to conduct
		search_type="semantic",
		# Bias search results that are more recent
		date_bias=False,
		# Filters are based on metadata keys that you inserted
		filters={
			"anykey": "any-data",
		},
		# Rather or not to fetch collisions, this is a
		# more advanced feature that is not used often
		get_collisions=False,
		# We highlight relevant parts of the search highlight_results
		# the delimeters are what characters to split on. By default
		# we split on sentence end. (., !, ?)
		highlight_delimiters=None,
		# Rather or not to highlight results
		highlight_results=True,
		# Require that the search results have a links that fuzzy match
		link=["example"],
		# What page of results to fetch
		page=1,
		# Only fetch results that are in a specified tag group
		tag_set=["test"],
		# A tuple of two strings, the start and end of the time range
		time_range=None,
	)

	search_response = search_chunk.sync(tr_dataset=dataset_id, client=client, body=search_data)
	if type(search_response) == SearchChunkQueryResponseBody:
			print(f"Got {search_response.total_chunk_pages} pages of results. Search results: {search_response}")
	elif type(search_response) == ErrorResponseBody:
			print(f"Failed to search body {search_response.message}")
			exit(1)

It could be difficult to visualize the search response but we have a default ui at https://search.trieve.ai/ to help visualize the results. This uses the sames authentication credentials that the dashbaord uses if you need to relogin. Make sure to set the dataset to your current dataset.

Search UI homepage

The search ui also displays all metadata objects that you inserted into the chunks. This is useful for debugging and understanding the data that you have inserted.

Search UI results displayed

Whats next?

Great, you're now set up with an API client and have made your first request to the API. Here are a few links that might be handy as you venture further:

happy hacking 🚀🚀🚀