LLM Gateway

The TrueFoundry LLM Gateway provides a single API using which you can call any LLM provider for inference. The supported providers include OpenAI, Anthropic, AWS Bedrock, Perplexity and many more including self-hosted models. The following features are supported:

  1. Unified API to access all LLMs from multiple providers including your own self hosted models.
  2. Centralised Key Management
  3. Authentication and attribution per user, per model.
  4. Cost Attribution and Control
  5. Fallback, retries and rate-limiting support (Coming Soon)
  6. Guardrails Integration (Coming Soon)
  7. Caching and Semantic Caching (Coming Soon)
  8. Support for Vision and Multimodal models (Coming Soon)
  9. Run Evaluations on your data (Coming Soon)

LLM Gateway Architecture when deployed in your own cloud environment

LLM Gateway is itself a very lightweight and fast micro-service that sits next your service calling the gateway API. You can have multiple copies of llm-gateway in multiple regions to avoid adding extra routing latency to your calls. The gateway micro=service reads its configuration from a DB via pub-sub mechanism and if configured, logs the desired prompts and responses to a Clickhouse database via a queue. The architecture diagram looks something like this:

This allows you to place multiple copies of the LLM gateway closer to your services or models in case you are deploying across regions. It also allows us to keep the LLM Gateway extremely lightweight and fast. The gateway can scale based on the incoming requests, hence allowing it to handle very high traffic to LLM models.