AI Gateway
A unified interface to all your LLMs - OpenAI, Anthropic, Bedrock, self-hosted etc.
The TrueFoundry LLM Gateway provides a single API using which you can call any LLM provider for inference. The supported providers include OpenAI, Anthropic, AWS Bedrock, Perplexity and many more including self-hosted models. The following features are supported:
- Unified API to access all LLMs from multiple providers including your own self hosted models.
- Centralised Key Management
- Authentication and attribution per user, per model.
- Cost Attribution and Control
- Fallback, retries and rate-limiting support (Coming Soon)
- Guardrails Integration (Coming Soon)
- Caching and Semantic Caching (Coming Soon)
- Support for Vision and Multimodal models (Coming Soon)
- Run Evaluations on your data (Coming Soon)
LLM Gateway Architecture when deployed in your own cloud environment
LLM Gateway is itself a very lightweight and fast micro-service that sits next your service calling the gateway API. You can have multiple copies of llm-gateway in multiple regions to avoid adding extra routing latency to your calls. The gateway micro=service reads its configuration from a DB via pub-sub mechanism and if configured, logs the desired prompts and responses to a Clickhouse database via a queue. The architecture diagram looks something like this:
This allows you to place multiple copies of the LLM gateway closer to your services or models in case you are deploying across regions. It also allows us to keep the LLM Gateway extremely lightweight and fast. The gateway can scale based on the incoming requests, hence allowing it to handle very high traffic to LLM models.
Add multiple providers to the gateway
The AI Gateway allows adding multiple model provider accounts such as OpenAI, Cohere, Bedrock, Google, TogetherAI, Perplexity and Grok. You can add and enable various model per provider account. In case you have multiple keys per provider, like two different OpenAI API keys, you can create multiple provider accounts. Then through authorization configuration, you can also decide who gets access to what.
Centralised Key Management
Distributing your core OpenAI or other provider keys to all developers is a big concern from a security standpoint. The AI Gateway allows you to add all the keys centrally and each developer/product gets their own API key to interact with the models. This keeps complete accountability of who is using the models without sacrificing the security of the root keys. The gateway can read the keys from your Secret Manager like AWS SSM, Google Secret Store or Azure Vault. You can also revoke permissions dynamically from users or products without affecting other users since everyone gets their own API keys.
In case you don't want to handover the API keys to developers, you can use a simple client side library to make the calls to the Gateway that automatically handles authentication for you using OAuth/OIDC connect without you having to manually copy paste keys. This provides enhanced security by issuing short lived tokens and automatically refreshing them.
Authentication and Authorization
Truefoundry AI Gateway provides a concept of ProviderAccounts which allows us to add different providers and enable or disable some models within them. You can add users or services to have access to certain provider accounts. For example, in the picture below, ProviderAccount4 and ProviderAccount5 are both AzureOpenAI providers but ProviderAccount4 is for dev environment and ProviderAccount5 is for production. We can grant User1, User2 access to ProviderAccount4 and LLMApp1 access to ProviderAccount5.
Authorization Configuration
The authorization configuration is a YAML with the following shape:
authz_rules:
- models:
- truefoundry-self-hosted/llama-2-7b-chat-hf-10707
- openai-main/text-embedding-ada-002
users:
- username1
- username2
The models property is an array of Fully Qualified Names (FQNs) of models. Each FQN comprises the provider account and the model name, separated by a /
.
The users property is an array of usernames. This will decide which users can access the models defined in the models property.
You can have multiple rules.
Call any provider using the unified API
The Gateway understands the schema expected by the supported providers and can translate your standard request to any of the providers. Because of this, users don't need to rewrite their code while switching models - they can use the standard OpenAI library/langchain/REST API call to interact with any model by simple switching the model name.
View Metrics for All LLM Calls
Interacting with the AI Gateway
Updated about 1 month ago