> ## Documentation Index > Fetch the complete documentation index at: https://docs.trieve.ai/llms.txt > Use this file to discover all available pages before exploring further. # Introduction > Trieve Vector Inference is an on-prem solution for fast vector inference ## Inspiration SaSS offerings for text embeddings have 2 major issues: 1. They have higher latency, due to batch processing. 2. They have heavy rate limits. Trieve Vector Inference was created so you could Host Dedicated embedding servers in your own cloud. ## Performance Difference Benchmarks ran using [wrk2](https://github.com/giltene/wrk2) over 30 seconds on 12 threads and 40 active connections. Machine used to test was on `m5.large` in `us-west-1`. | | OPENAI Cloud | JINA AI Cloud\* | JINA (SageMaker)\*\* | TVI Jina | TVI BGE-M3 | TVI Nomic | | --------------- | ------------ | --------------- | -------------------- | ------------------------------------------- | ------------------------------------------- | ------------------------------------------- | | P50 Latency | 193.15 ms | 179.33 ms | 185.21 ms | 19.06 ms | 14.69 ms | 21.36 ms | | P90 Latency | 261.25 ms | 271.87 ms | 296.19 ms | 23.09 ms | 16.90 ms | 29.81 ms | | P99 Latency | 621.05 ms | 402.43 ms | 306.94 ms | 24.27 ms | 18.80 ms | 30.29 ms | | Requests Made | 324 | 324 | 324 | 324 | 324 | 324 | | Requests Failed | 0 | 0 | 3 | 0 | 0 | 0 | | | OPENAI Cloud | JINA AI Cloud\* | JINA (SageMaker)\*\* | TVI Jina | TVI BGE-M3 | TVI Nomic | | --------------- | ------------ | --------------- | -------------------- | ------------------------------------------- | ------------------------------------------- | ------------------------------------------- | | P50 Latency | 180.74 ms | 182.62 ms | 515.84 ms | 16.48 ms | 14.35 ms | 23.22 ms | | P90 Latency | 222.34 ms | 262.65 ms | 654.85 ms | 20.70 ms | 16.15 ms | 29.71 ms | | P99 Latency | 1.11 sec | 363.01 ms | 724.48 ms | 22.82 ms | 19.82 ms | 31.07 ms | | Requests Made | 2,991 | 2,991 | 2963 | 3,015 | 3,024 | 3,024 | | Requests Failed | 0 | 2,986 | 0 | 0 | 0 | 0 | | | OPENAI Cloud | JINA AI Cloud\* | JINA (SageMaker)\*\* | TVI Jina | TVI BGE-M3 | TVI Nomic | | --------------- | ------------ | --------------- | -------------------- | ------------------------------------------- | ------------------------------------------- | ------------------------------------------- | | P50 Latency | 15.70 sec | 15.82 sec | 17.97 sec | 24.40 ms | 14.86 ms | 23.74 ms | | P90 Latency | 22.01 sec | 21.91 sec | 25.30 sec | 25.14 ms | 17.81 ms | 31.74 ms | | P99 Latency | 23.59 sec | 23.12 sec | 27.03 sec | 27.61 ms | 19.52 ms | 34.11 ms | | Requests Made | 6,234 | 6,771 | 2963 | 30,002 | 30,002 | 30,001 | | Requests Failed | 0 | 6,711 | 0 | 0 | 0 | 0 | \* Failed requests was when rate limiting hit in (Jina AI rate limit is 60 RPM or 300 RPM for premium plan) \*\* `jina-embeddings-v2-base-en` on Sagemaker with `ml.g4dn.xlarge` ## See more Adding Trieve Vector Inference into your AWS account Using the `/embed` route Check out the API Reference to see all of the available endpoints for Trieve Vector Inference Check out the API Reference to see all of the available endpoints for Trieve Vector Inference