AWS Installation
Install Trieve Vector Inference in your own AWS account
Installation Requirements:
eksctl
>= 0.171 (eksctl installation guide)aws
>= 2.15 (aws installation guide)kubectl
>= 1.28 (kubectl installation guide)helm
>= 3.14 (helm installation guide)
You’ll also need a license to run TVI.
Getting your license
Contact us:
- Email us at humans@trieve.ai
- Book a meeting
- Call us @ 628-222-4090
- AWS Marketplace Subscription
Our pricing is here
Check AWS Quota
Ensure you have quotas for both GPUs and load balancers.
- At least 4 vCPUs for On-Demand G and VT instances in the region of choice.
Check quota here
- You will need 1 load balancer for each model you want.
Check quota here
Deploying the Cluster
Setting up environment variables
Create EKS cluster and install needed plugins
Your AWS Account ID:
Your AWS Region:
GPU_INSTANCE
that are chosen Your Kubernetes cluster name:
Your machine types, we recommend g4dn.xlarge
, as it is the cheapest on AWS. A single small node is needed for extra utility:
Disable AWS CLI pagination (optional):
To use our recommended defaults:
Create your cluster
Download the bootstrap-eks.sh
script
Run bootstrap-eks.sh
with bash
This will take ~25 minutes to complete.
Install Trieve Vector Inference
Configure embedding_models.yaml
First, download the example configuration file:
Now you can modify your embedding_models.yaml
. This defines all the models that you will want to use:
Install the helm chart
This helm chart will only work if you subscribe to the AWS Marketplace Listing
Login to AWS ecr repository
Install the helm chart from the Marketplace ECR repository
Get your model endpoints
The output looks something like this:
Using Trieve Vector Inference
Each ingress
point will be using their own Application Load Balancer within AWS. The Address
provided is the model’s endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.
Check out the guides for more information on configuration.
Using SPLADE Models
How to setup a dedicated instance for the sparse SPLADE embedding model
Using Custom Models
How to use private, gated Hugging Face models, or any models that you want
OpenAI compatibility
Trieve Vector Inference has OpenAI compatible routes