Send Requests to the Deployed LLM


What you'll learn

  • Send Request to a Deployed Model via API

Step 1: Navigate to the Deployment Page of the Model

Click on Deployments and under Services find your deployed model. You will find this dashboard:

Step 2: Send API Request

To send an API request to deployed LLM, go to the OpenAPI section, click on Text Generation Inference and then on Generate Tokens.

A sample request body is already populated. Click on Send API Request to send a POST request to the deployed model.

Copy the corresponding Python code for the request.

Here is the generated code snippet:

import requests

url = "https://llama-2-7b-llm-demo.{your-org-domain}/generate"

payload = {
    "inputs": "My name is Olivier and I",
    "parameters": {
        "max_new_tokens": 50,
        "repetition_penalty": 1.03,
        "return_full_text": False,
        "temperature": 0.5,
        "top_k": 10,
        "top_p": 0.95
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json"

response =, json=payload, headers=headers)


Whatโ€™s Next