Introduction
Trieve Vector Inference is an on-prem solution for fast vector inference
Inspiration
SaSS offerings for text embeddings have 2 major issues:
- They have higher latency, due to batch processing.
- They have heavy rate limits.
Trieve Vector Inference was created so you could Host Dedicated embedding servers in your own cloud.
Performance Difference
Benchmarks ran using wrk2 over 30 seconds on 12 threads and 40 active connections.
Machine used to test was on m5.large
in us-west-1
.
OPENAI Cloud | JINA AI Cloud* | JINA (SageMaker)** | TVI Jina | TVI BGE-M3 | TVI Nomic | |
---|---|---|---|---|---|---|
P50 Latency | 193.15 ms | 179.33 ms | 185.21 ms | 19.06 ms | 14.69 ms | 21.36 ms |
P90 Latency | 261.25 ms | 271.87 ms | 296.19 ms | 23.09 ms | 16.90 ms | 29.81 ms |
P99 Latency | 621.05 ms | 402.43 ms | 306.94 ms | 24.27 ms | 18.80 ms | 30.29 ms |
Requests Made | 324 | 324 | 324 | 324 | 324 | 324 |
Requests Failed | 0 | 0 | 3 | 0 | 0 | 0 |
* Failed requests was when rate limiting hit in (Jina AI rate limit is 60 RPM or 300 RPM for premium plan)
** jina-embeddings-v2-base-en
on Sagemaker with ml.g4dn.xlarge
See more
AWS On-Prem Installation
Adding Trieve Vector Inference into your AWS account
Creating Embeddings
Using the /embed
route
Custom Embedding Models
Check out the API Reference to see all of the available endpoints for Trieve Vector Inference
Splade V2
Check out the API Reference to see all of the available endpoints for Trieve Vector Inference