Measure token generation throughput, Time to First Token (TTFT), Inter Token Latency of LLMs via the chat completions API
Application Catalog
env
section for MODEL_NAME
ports
section
model_id
under artifacts_download.artifacts
gpt-4
, gpt-4o
)Quivr/gpt-4o
)</> Code
button to view the API integration code
https://truefoundry.tech/api/llm/api/inference/openai/chat/completions
)"model": "openai-main/gpt-4o"
)Quivr/gpt-4o
)Benchmarking Results