- Performance Metrics: Track key latency metrics like Request Latency, Time to First Token (TTFS), and Inter-Token Latency (ITL) with P99, P90, and P50 percentiles.
- Cost and Token Usage: Gain visibility into your application’s costs with detailed breakdowns of input/output tokens and the associated expenses for each model.
- Usage Patterns: Understand how your application is being used with detailed analytics on user activity, model distribution, and team-based usage.
- Error Analysis: Quickly identify and diagnose issues with a view of error rates and error code information.
- Configuration Impact: Evaluate the effectiveness of your gateway configurations by monitoring how often rate limiting, load balancing, fallbacks, guardrails, and budget limits are triggered.
Key Views
The dashboard is organized into several views, each designed to provide a specific perspective on your data.Analyze Performance by Model
This view provides a model-centric overview of your application’s performance. All graphs and metrics are grouped bymodel_name
, allowing you to directly compare how different models perform under real-world load.
Use this view to:
- Compare response times and streaming latency (TTFS & ITL) and pin point the models which are underperfoming or causing latency issues.
- Monitor cost and token consumption to keep your budget in check.
- Track requests per second and error rates to identify performance or reliability issues with specific models.


- Request Latency: The total time taken to process a request.
- Time to First Token (TTFS): The time elapsed until the first token of a response is received (for streaming responses).
- Inter-Token Latency (ITL): The average time between consecutive tokens in a response (for streaming responses).
- Cost Per Model: The total cost incurred by each model.
- Input/Output Tokens: The number of tokens processed by each model.
- Error Codes: A breakdown of errors by type for each model.
Analyze Usage by User
This view pivots the dashboard to show metrics based on who is making the requests, grouping all data byusername
. This is useful for understanding the usage patterns of individual applications or users.

- Identify your most active users or applications.
- Compare latency and token costs for different users.
- Debug user-specific issues by filtering for their traffic.
Analyze Usage by Team
Similar to the user view, this dashboard groups metrics byteam_name
, helping you understand the usage patterns of different teams.

- Track costs per team for internal chargebacks or budget management.
- Identify which teams are the heaviest users of the LLM gateway.
Evaluate Configuration Usage
This view reveals how your gateway configurations are impacting your requests by grouping them by theruleId
that was triggered. It helps you see which policies are having the most impact.

- Rate Limiting: How often requests are being throttled.
- Load Balancing: How traffic is being distributed across different models or deployments.
- Fallbacks: When and why fallback models are being used.
- Guardrails: How often content policies are being enforced.
- Budget Configs: When budget limits are being reached.
Create Custom Views with Metadata
For more specific analysis, the Metrics Dashboard allows you to group data by custom metadata keys sent in the request headers. For example, you can group by atenant_name
to analyze metrics for each of your customers.

Filtering and Drill-Down
The dashboard also includes filters that allow you to narrow down your analysis to specific models, users, teams, or custom metadata fields. This makes it easier to investigate specific patterns or issues.
Downloading Metrics Data
In addition to viewing metrics in the dashboard, you can also download the complete raw data in CSV format. This is useful for offline analysis, creating custom visualizations, or integrating with other data analysis tools.