Skip to main content

Inspiration

SaSS offerings for text embeddings have 2 major issues:
  1. They have higher latency, due to batch processing.
  2. They have heavy rate limits.
Trieve Vector Inference was created so you could Host Dedicated embedding servers in your own cloud.

Performance Difference

Benchmarks ran using wrk2 over 30 seconds on 12 threads and 40 active connections. Machine used to test was on m5.large in us-west-1.
OPENAI CloudJINA AI Cloud*JINA (SageMaker)**TVI JinaTVI BGE-M3TVI Nomic
P50 Latency193.15 ms179.33 ms185.21 ms 19.06 ms 14.69 ms 21.36 ms
P90 Latency261.25 ms271.87 ms296.19 ms 23.09 ms 16.90 ms 29.81 ms
P99 Latency621.05 ms402.43 ms306.94 ms 24.27 ms 18.80 ms 30.29 ms
Requests Made324324324 324 324 324
Requests Failed003 0 0 0
* Failed requests was when rate limiting hit in (Jina AI rate limit is 60 RPM or 300 RPM for premium plan)
** jina-embeddings-v2-base-en on Sagemaker with ml.g4dn.xlarge

See more

AWS On-Prem Installation

Adding Trieve Vector Inference into your AWS account

Creating Embeddings

Using the /embed route

Custom Embedding Models

Check out the API Reference to see all of the available endpoints for Trieve Vector Inference

SPLADE V2

Check out the API Reference to see all of the available endpoints for Trieve Vector Inference