Intregration With Langchain

Truefoundry provides a convenient way to interact with Truefoundry's LLM model deployment's and integrate with Langchain. It is built on top of the langchain library.

Prerequisites

Before using this class, make sure you have the langchain library installed. You can install it using pip:

pip install langchain

Usage for Truefoundry Hosted Models

Here's an example of how to use the TruefoundryLLM and integrate with langchain:

from servicefoundry.langchain import TruefoundryLLM

# Specify the endpoint URL for the model
endpoint_url = "<https://pythia-70m-model-model-catalogue.demo2.truefoundry.tech>"

# Create an instance of TruefoundryLLM

model = TruefoundryLLM(  
    endpoint_url=endpoint_url,  
    parameters={  
        "max_new_tokens": 100,  
        "temperature": 0.7,  
        "top_k": 5,  
        "top_p": 0.9  
    }  
)

model.predict("Tell me a joke")

In the example above, we create an instance of the TruefoundryLLM class by providing the endpoint_url for the Truefoundry model and setting the desired parameters. The parameters specify various configuration options for generating new tokens using the model.

Afterwards, we call the model instance with the prompt "Tell me a joke." The model will generate text based on the prompt and return it as a string.

Reference

TruefoundryLLM(endpoint_url, parameters)

The TruefoundryLLM class constructor takes two arguments:

  • endpoint_url (str): The endpoint URL for the Truefoundry model. This should be provided as a string.
  • model_name (str, optional): The name of the deployed model. If not specified, the class will automatically select the first available model from the model server.
  • parameters (dict): A dictionary containing the parameters to configure the model. The available parameters depend on the specific model being used.
    • max_new_tokens (int): The maximum number of new tokens to generate.
    • temperature (float): The temperature parameter for controlling the randomness of token generation. Higher values result in more random outputs.
    • top_k (int): The k parameter for the top-k sampling method. It specifies the number of top tokens to consider for sampling.
    • top_p (float): The p parameter for the top-p (nucleus) sampling method. It specifies the cumulative probability threshold for selecting tokens.

Usage for Truefoundry LLM Playground Models

Here is an example how you can query models from Truefoundry's LLM Playground using our langchain integration

from servicefoundry.langchain import TruefoundryPlaygroundLLM
import os

# Note: Login using servicefoundry login --host <https://example-domain.com>
model = TruefoundryPlaygroundLLM(
  model_name="vicuna-13b",
  parameters={
    "maximumLength": 100,
    "temperature": 0.7,
    "topP": 0.9,
    "repetitionPenalty": 1
  }
)
response = model.predict("Enter the prompt here")

Reference (TruefoundryPlaygroundLLM)

TruefoundryPlaygroundLLM(model_name, provider, parameters)

The TruefoundryPlaygroundLLM class constructor takes two arguments:

  • model_name (str): The name of the deployed model.
  • provider (str): The name of the provider, by default it is truefoundry-public
  • parameters (dict): A dictionary containing the parameters to configure the model. The available parameters depend on the specific model being used.
    • maximumLength (int): The maximum number of new tokens to generate.
    • temperature (float): The temperature parameter for controlling the randomness of token generation. Higher values result in more random outputs.
    • repetitionPenalty (float): The parameter for repetition penalty. 1.0 means no penalty
    • topP (float): The p parameter for the top-p (nucleus) sampling method. It specifies the cumulative probability threshold for selecting tokens.