The Metrics Dashboard gives you a complete picture of your LLM application’s health and efficiency. Pinpoint performance bottlenecks, track spending in real-time, and see exactly how your application is being used, all from a single, interactive interface. With the Metrics Dashboard, you can monitor and analyze:
  • Performance Metrics: Track key latency metrics like Request Latency, Time to First Token (TTFS), and Inter-Token Latency (ITL) with P99, P90, and P50 percentiles.
  • Cost and Token Usage: Gain visibility into your application’s costs with detailed breakdowns of input/output tokens and the associated expenses for each model.
  • Usage Patterns: Understand how your application is being used with detailed analytics on user activity, model distribution, and team-based usage.
  • Error Analysis: Quickly identify and diagnose issues with a view of error rates and error code information.
  • Configuration Impact: Evaluate the effectiveness of your gateway configurations by monitoring how often rate limiting, load balancing, fallbacks, guardrails, and budget limits are triggered.

Key Views

The dashboard is organized into several views, each designed to provide a specific perspective on your data.

Analyze Performance by Model

This view provides a model-centric overview of your application’s performance. All graphs and metrics are grouped by model_name, allowing you to directly compare how different models perform under real-world load. Use this view to:
  • Compare response times and streaming latency (TTFS & ITL) and pin point the models which are underperfoming or causing latency issues.
  • Monitor cost and token consumption to keep your budget in check.
  • Track requests per second and error rates to identify performance or reliability issues with specific models.
Performance metrics dashboard showing model comparison with latency and request statistics Cost and token usage breakdown by model with bar charts and trend lines This view includes graphs for:
  • Request Latency: The total time taken to process a request.
  • Time to First Token (TTFS): The time elapsed until the first token of a response is received (for streaming responses).
  • Inter-Token Latency (ITL): The average time between consecutive tokens in a response (for streaming responses).
  • Cost Per Model: The total cost incurred by each model.
  • Input/Output Tokens: The number of tokens processed by each model.
  • Error Codes: A breakdown of errors by type for each model.

Analyze Usage by User

This view pivots the dashboard to show metrics based on who is making the requests, grouping all data by username. This is useful for understanding the usage patterns of individual applications or users. User-based analytics dashboard showing request patterns and costs per user Use this view to:
  • Identify your most active users or applications.
  • Compare latency and token costs for different users.
  • Debug user-specific issues by filtering for their traffic.

Analyze Usage by Team

Similar to the user view, this dashboard groups metrics by team_name, helping you understand the usage patterns of different teams. Team-based analytics dashboard showing usage metrics and costs across different teams Use this view to:
  • Track costs per team for internal chargebacks or budget management.
  • Identify which teams are the heaviest users of the LLM gateway.

Evaluate Configuration Usage

This view reveals how your gateway configurations are impacting your requests by grouping them by the ruleId that was triggered. It helps you see which policies are having the most impact. Configuration impact dashboard showing triggered rules and their frequency This view helps you understand the impact of:
  • Rate Limiting: How often requests are being throttled.
  • Load Balancing: How traffic is being distributed across different models or deployments.
  • Fallbacks: When and why fallback models are being used.
  • Guardrails: How often content policies are being enforced.
  • Budget Configs: When budget limits are being reached.

Create Custom Views with Metadata

For more specific analysis, the Metrics Dashboard allows you to group data by custom metadata keys sent in the request headers. For example, you can group by a tenant_name to analyze metrics for each of your customers. Custom metadata view showing metrics grouped by user-defined categories

Filtering and Drill-Down

The dashboard also includes filters that allow you to narrow down your analysis to specific models, users, teams, or custom metadata fields. This makes it easier to investigate specific patterns or issues. Filter interface showing options to narrow down analytics by various parameters

Downloading Metrics Data

In addition to viewing metrics in the dashboard, you can also download the complete raw data in CSV format. This is useful for offline analysis, creating custom visualizations, or integrating with other data analysis tools. Data export interface showing CSV download option for metrics data