Supported Providers

  • OpenAI
  • Azure OpenAI
  • Groq
The Transcriptions API converts spoken audio into written text using advanced speech recognition models. You can provide audio in multiple formats, and the API will generate a transcription in the same language as the input. You can customize the input and output as follows:
  • Models: whisper-1, gpt-4o-mini-transcribe, or gpt-4o-transcribe
  • Input formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
  • Output formats:
    • whisper-1: json, text, srt, verbose_json, vtt
    • gpt-4o-mini-transcribe & gpt-4o-transcribe: json, text

Code Snippet

from openai import OpenAI

BASE_URL = "https://{controlPlaneUrl}/api/llm"
API_KEY = "your-truefoundry-api-key"

# Configure OpenAI client with TrueFoundry settings
client = OpenAI(
    api_key=API_KEY,
    base_url=BASE_URL,
)

audio_file= open("/path/to/file/audio.mp3", "rb")

transcription = client.audio.transcriptions.create(
    model="openai-main/gpt-4o-transcribe", 
    file=audio_file
)

print(transcription.text)
By default, the response type will be json with the raw text included.
{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger."
}