The bootstrap-eks.sh script will create the EKS cluster, install the AWS Load Balancer Controller, and install the NVIDIA Device Plugin. This will also manage any IAM permissions that are needed for the plugins to work.
Now you can modify your embedding_models.yaml. This defines all the models that you will want to use:
embedding_models.yaml
Copy
Ask AI
models: # ... # myEmbeddingModel: # # The number of replicas you want # replicas: 1 # # The huggingface revision # revision: main # # Your huggingface token if you have a private repo # hfToken: # # The end of the URL https://huggingface.co/BAAI/bge-m3 # modelName BAAI/bge-m3 bgeM3: replicas: 2 revision: main # The end of the URL https://huggingface.co/BAAI/bge-m3 modelName: BAAI/bge-m3 # If you have a private hugging face repo hfToken: "" spladeDoc: replicas: 2 # The end of the URL https://huggingface.co/naver/efficient-splade-VI-BT-large-doc modelName: naver/efficient-splade-VI-BT-large-doc isSplade: true spladeQuery: replicas: 2 # The end of the URL https://huggingface.co/naver/efficient-splade-VI-BT-large-doc modelName: naver/efficient-splade-VI-BT-large-doc isSplade: true bge-reranker: replicas: 2 modelName: BAAI/bge-reranker-large isSplade: false # ...
To ensure everything is working, make a request to the model endpoint provided.
Copy
Ask AI
# Replace the endpoint with the one you got from the previous stepexport ENDPOINT=k8s-default-vectorin-18b7ade77a-2040086997.us-east-2.elb.amazonaws.comcurl -X POST \ -H "Content-Type: application/json"\ -d '{"inputs": "test input"}' \ --url "http://$ENDPOINT/embed" \ -w "\n\nInfernce Took%{time_total} seconds\!\n"
The output should look like something like this
Copy
Ask AI
# The vector[[ 0.038483415, -0.00076982786, -0.020039458 ... ], [ 0.04496114, -0.039057795, -0.022400795, ... ]]Inference only Took 0.067066 seconds!
Each ingress point will be using their own Application Load Balancer within AWS. The Address provided is the model’s endpoint that you can make dense embeddings, sparse embeddings, or reranker calls based on the models you chose.
Check out the guides for more information on configuration.