AWS Installation
Install Trieve Vector Inference in your own AWS account
Installation Requirements:
eksctl
>= 0.171 (eksctl installation guide)aws
>= 2.15 (aws installation guide)kubectl
>= 1.28 (kubectl installation guide)helm
>= 3.14 (helm installation guide)- A Trieve Vector Inference License
Getting your license
Contact us:
- Email us at humans@trieve.ai
- Book a meeting
- Call us @ 628-222-4090
- AWS Marketplace Subscription
Our pricing is here
Check AWS Quota
Ensure you have quotas for both GPUs and load balancers.
- At least 4 vCPUs for On-Demand G and VT instances in the region of choice.
Check quota here
- You will need 1 load balancer for each model you want.
Check quota here
Deploying the Cluster
Setting up environment variables
Your AWS Account ID:
Your AWS Region:
Your Kubernetes cluster name:
Your machine types, we recommend g4dn.xlarge
, as it is the cheapest on AWS. A single small node is needed for extra utility:
Disable AWS CLI pagination (optional):
To use our recommended defaults:
GPU_INSTANCE
that are chosen Create your cluster
Create EKS cluster and install needed plugins
The bootstrap-eks.sh
script will create the EKS cluster, install the AWS Load Balancer Controller, and install the NVIDIA Device Plugin. This will also manage any IAM permissions that are needed for the plugins to work.
Download the bootstrap-eks.sh
script
Run bootstrap-eks.sh
with bash
This will take ~25 minutes to complete.
Install Trieve Vector Inference
Configure embedding_models.yaml
First, download the example configuration file:
Now you can modify your embedding_models.yaml
. This defines all the models that you will want to use:
Install the helm chart
This helm chart will only work if you subscribe to the AWS Marketplace Listing.
Contact us at humans@trieve.ai if you do not have access to the AWS Marketplace or cannot use AWS marketplace.
Login to AWS ecr repository
Install the helm chart from the Marketplace ECR repository
Get your model endpoints
The output looks something like this:
The Address
field is the endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.
To ensure everything is working, make a request to the model endpoint provided.
The output should look like something like this
Using Trieve Vector Inference
Each ingress
point will be using their own Application Load Balancer within AWS. The Address
provided is the model’s endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.
Check out the guides for more information on configuration.
Using SPLADE Models
How to setup a dedicated instance for the sparse SPLADE embedding model
Using Custom Models
How to use private, gated Hugging Face models, or any models that you want
OpenAI compatibility
Trieve Vector Inference has OpenAI compatible routes