Trieve Vector Inference is an on-prem solution for fast vector inference
m5.large
in us-west-1
.
OPENAI Cloud | JINA AI Cloud* | JINA (SageMaker)** | TVI Jina | TVI BGE-M3 | TVI Nomic | |
---|---|---|---|---|---|---|
P50 Latency | 193.15 ms | 179.33 ms | 185.21 ms | 19.06 ms | 14.69 ms | 21.36 ms |
P90 Latency | 261.25 ms | 271.87 ms | 296.19 ms | 23.09 ms | 16.90 ms | 29.81 ms |
P99 Latency | 621.05 ms | 402.43 ms | 306.94 ms | 24.27 ms | 18.80 ms | 30.29 ms |
Requests Made | 324 | 324 | 324 | 324 | 324 | 324 |
Requests Failed | 0 | 0 | 3 | 0 | 0 | 0 |
jina-embeddings-v2-base-en
on Sagemaker with ml.g4dn.xlarge
/embed
route