Explore the architecture of the TrueFoundry AI Gateway, designed for high availability, low latency, and scalable LLM integration in production environments.
TrueFoundry AI Gateway Architecture
User makes a request to the AI Gateway
Gateway validates the request
Gateway checks for budget-limiting and rate-limiting rules
Gateway selects the model based on the load-balancing rules
Router/Adapter translates the request to the appropriate model
Handle Response from the model
Log the request and response
What role does NATS play in the architecture?
How does the Gateway enforce rate-limiting?