This guide takes ~30 minutes to complete. Expect ~20 minutes of this to be EKS spinning up.

Installation Requirements:

You’ll also need a license to run TVI.

Getting your license

Contact us:

Our pricing is here

Check AWS Quota

Ensure you have quotas for both GPUs and load balancers.

  1. At least 4 vCPUs for On-Demand G and VT instances in the region of choice.

Check quota here

  1. You will need 1 load balancer for each model you want.

Check quota here

Deploying the Cluster

Setting up environment variables

Create EKS cluster and install needed plugins

Your AWS Account ID:

export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)"

Your AWS Region:

TVI supports all regions that have the GPU_INSTANCE that are chosen
export AWS_REGION=us-east-2

Your Kubernetes cluster name:

export CLUSTER_NAME=trieve-gpu

Your machine types, we recommend g4dn.xlarge, as it is the cheapest on AWS. A single small node is needed for extra utility:

export CPU_INSTANCE_TYPE=t3.small
export GPU_INSTANCE_TYPE=g4dn.xlarge
export GPU_COUNT=1

Disable AWS CLI pagination (optional):

export AWS_PAGER=""

To use our recommended defaults:

export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)"
export AWS_REGION=us-east-2
export CLUSTER_NAME=trieve-gpu
export CPU_INSTANCE_TYPE=t3.small
export GPU_INSTANCE_TYPE=g4dn.xlarge
export GPU_COUNT=1
export AWS_PAGER=""

Create your cluster

Download the bootstrap-eks.sh script

wget cdn.trieve.ai/bootstrap-eks.sh

Run bootstrap-eks.sh with bash

bash bootstrap-eks.sh

This will take ~25 minutes to complete.

Install Trieve Vector Inference

Configure embedding_models.yaml

First, download the example configuration file:

wget https://cdn.trieve.ai/embedding_models.yaml

Now you can modify your embedding_models.yaml. This defines all the models that you will want to use:

embedding_models.yaml
models:
  # ...
  # myEmbeddingModel:
  #   replicas: 1
  #   revision: main # The huggingface model 
  #   hfToken: # the huggingface token
  #   modelName BAAI/bge-m3 # The end of the URL https://huggingface.co/BAAI/bge-m3 
  bgeM3:
    replicas: 2
    revision: main
    modelName: BAAI/bge-m3 # The end of the URL https://huggingface.co/BAAI/bge-m3
    hfToken: "" # If you have a private hugging face repo
  spladeDoc:
    replicas: 2
    modelName: naver/efficient-splade-VI-BT-large-doc # The end of the URL https://huggingface.co/naver/efficient-splade-VI-BT-large-doc
    isSplade: true
  spladeQuery:
    replicas: 2
    modelName: naver/efficient-splade-VI-BT-large-doc # The end of the URL https://huggingface.co/naver/efficient-splade-VI-BT-large-doc
    isSplade: true
  bge-reranker:
    replicas: 2
    modelName: BAAI/bge-reranker-large
    isSplade: false
  # ...

Install the helm chart

This helm chart will only work if you subscribe to the AWS Marketplace Listing

1

Login to AWS ecr repository

 aws ecr get-login-password \
    --region us-east-1 | helm registry login \
    --username AWS \
    --password-stdin 709825985650.dkr.ecr.us-east-1.amazonaws.com
2

Install the helm chart from the Marketplace ECR repository

helm upgrade -i vector-inference \
    oci://709825985650.dkr.ecr.us-east-1.amazonaws.com/trieve/trieve-embeddings \
    -f embedding_models.yaml

Get your model endpoints

kubectl get ingress

The output looks something like this:

NAME                                              CLASS   HOSTS   ADDRESS                                                                  PORTS   AGE
vector-inference-embedding-bge-reranker-ingress   alb     *       k8s-default-vectorin-18b7ade77a-2040086997.us-east-2.elb.amazonaws.com   80      73s
vector-inference-embedding-bgem3-ingress          alb     *       k8s-default-vectorin-25e84e25f0-1362792264.us-east-2.elb.amazonaws.com   80      73s
vector-inference-embedding-spladedoc-ingress      alb     *       k8s-default-vectorin-8af81ad2bd-192706382.us-east-2.elb.amazonaws.com    80      72s
vector-inference-embedding-spladequery-ingress    alb     *       k8s-default-vectorin-10404abaee-1617952667.us-east-2.elb.amazonaws.com   80      3m20s

Using Trieve Vector Inference

Each ingress point will be using their own Application Load Balancer within AWS. The Address provided is the model’s endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.

Check out the guides for more information on configuration.

Optional: Delete the cluster

CLUSTER_NAME=trieve-gpu
REGION=us-east-2

aws eks update-kubeconfig --region ${REGION} --name ${CLUSTER_NAME}

helm uninstall vector-release
helm uninstall nvdp -n kube-system
helm uninstall aws-load-balancer-controller -n kube-system
eksctl delete cluster --region=${REGION} --name=${CLUSTER_NAME}