This guide takes ~30 minutes to complete. Expect ~20 minutes of this to be EKS spinning up.
Installation Requirements:
eksctl
>= 0.171 (eksctl installation guide)aws
>= 2.15 (aws installation guide)kubectl
>= 1.28 (kubectl installation guide)helm
>= 3.14 (helm installation guide)- A Trieve Vector Inference License
IAM Policy Minimum Requirements
IAM Policy Minimum Requirements
You need to have an IAM policy that allows to use the
eksctl
CLI.The most up-to-date guide is located hereYou are able to use the root account. However, AWS does not recommend doing this.Getting your license
Contact us:- Email us at humans@trieve.ai
- Book a meeting
- Call us @ 628-222-4090
- AWS Marketplace Subscription
Check AWS Quota
Ensure you have quotas for both GPUs and load balancers.
- At least 4 vCPUs for On-Demand G and VT instances in the region of choice.
- You will need 1 load balancer for each model you want.
Deploying the Cluster
Setting up environment variables
Your AWS Account ID:g4dn.xlarge
, as it is the cheapest on AWS. A single small node is needed for extra utility:
TVI supports all regions that have the
GPU_INSTANCE
that are chosen Create your cluster
Create EKS cluster and install needed plugins Thebootstrap-eks.sh
script will create the EKS cluster, install the AWS Load Balancer Controller, and install the NVIDIA Device Plugin. This will also manage any IAM permissions that are needed for the plugins to work.
Download the bootstrap-eks.sh
script
bootstrap-eks.sh
with bash
Install Trieve Vector Inference
Configure embedding_models.yaml
First, download the example configuration file:
embedding_models.yaml
. This defines all the models that you will want to use:
embedding_models.yaml
Install the helm chart
This helm chart will only work if you subscribe to the AWS Marketplace Listing.
Contact us at humans@trieve.ai if you do not have access to the AWS Marketplace or cannot use AWS marketplace.
1
Login to AWS ecr repository
2
Install the helm chart from the Marketplace ECR repository
Get your model endpoints
Address
field is the endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.
To ensure everything is working, make a request to the model endpoint provided.
Using Trieve Vector Inference
Eachingress
point will be using their own Application Load Balancer within AWS. The Address
provided is the model’s endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.
Check out the guides for more information on configuration.
Using SPLADE Models
How to setup a dedicated instance for the sparse SPLADE embedding model
Using Custom Models
How to use private, gated Hugging Face models, or any models that you want
OpenAI compatibility
Trieve Vector Inference has OpenAI compatible routes