Introduction

Inspiration

SaSS offerings for text embeddings have 2 major issues:

Trieve Vector Inference was created so you could Host Dedicated embedding servers in your own cloud.

Benchmarks ran using wrk2 over 30 seconds on 12 threads and 40 active connections. Machine used to test was on m5.large in us-west-1.

	OPENAI Cloud	JINA AI Cloud*	JINA (SageMaker)**	TVI Jina	TVI BGE-M3	TVI Nomic
P50 Latency	193.15 ms	179.33 ms	185.21 ms	19.06 ms	14.69 ms	21.36 ms
P90 Latency	261.25 ms	271.87 ms	296.19 ms	23.09 ms	16.90 ms	29.81 ms
P99 Latency	621.05 ms	402.43 ms	306.94 ms	24.27 ms	18.80 ms	30.29 ms
Requests Made	324	324	324	324	324	324
Requests Failed	0	0	3	0	0	0

* Failed requests was when rate limiting hit in (Jina AI rate limit is 60 RPM or 300 RPM for premium plan)

** jina-embeddings-v2-base-en on Sagemaker with ml.g4dn.xlarge