Some models might not be supported or might need additional inputs since the HuggingFace tags might be missing or incorrect.
Deploying private/gated HuggingFace models
In some cases, you might need a HuggingFace token to access the model. This is needed if you are trying to download your own private models or using gated models like Llama which require you to accept the license and terms of use. In this case, you will need to create a secret with your HuggingFace token. You can create a secret following the guide here. You will be asked to provide the secret name as input.
Key Tasks (Model Types) supported
Here’s the list of the most commonly used task types that are supported in TrueFoundry and how to infer from the model endpoint.We support other tasks on a best effort basis. You can find the input formats for them here
LLMs (text-generation) and VLMs (image-text-to-text)
LLMs (text-generation) and VLMs (image-text-to-text)
We use vLLM, SGLang or TRT-LLM to deploy these models.To read more on the deployment of LLMs, please refer to the LLM Deployment guide.
Embedding (sentence-similarity and feature-extraction)
Embedding (sentence-similarity and feature-extraction)
If the model is trained using
sentence-transformers
library we use Text-Embeddings Inference or Infinity to deploy these models.You can use OpenAI
SDK to generate embeddings from the model.Ranking (sentence-similarity and feature-extraction)
Ranking (sentence-similarity and feature-extraction)
If the model is trained using The output will be an array of classes with their score.
sentence-transformers
library we use Text-Embeddings Inference or Infinity to deploy these models.Text Classification (text-classification)
Text Classification (text-classification)
We use modified Multi-Model Server to deploy these models.The output will be an array of classes with their score.
Zero-Shot Classification (zero-shot-classification)
Zero-Shot Classification (zero-shot-classification)
Token Classification (token-classification)
Token Classification (token-classification)
We use modified Multi-Model Server to deploy these models.The output will be subject to change based on the mode, for NER classification it would look like this.
Fill Mask (fill-mask)
Fill Mask (fill-mask)
We use modified Multi-Model Server to deploy these models.The output will be the array of words with the percentage of how likely a work can fit and replace
[MASK]
Summarization (summarization)
Summarization (summarization)
Translation (translation)
Translation (translation)
We use modified Multi-Model Server to deploy these models.The ouput will containt the translated text
Image Generation (text-to-image)
Image Generation (text-to-image)
For models like Flux and Stable Diffusion we use Nvidia PyTriton to deploy these models.The above code will call the api and then it will save the image as image.png in root directory.
Image Classification (image-classification)
Image Classification (image-classification)
We use modified Multi-Model Server to deploy these models.In this, the response will be the array of objects that will have the confidence percentage of the prediction and the prediction category.
Object Detection (object-detection)
Object Detection (object-detection)
We use modified Multi-Model Server to deploy these models.The output will contain the labels along with the co-ordinates of box
Image to Text (image-to-text)
Image to Text (image-to-text)
Automatic Speech Recognition (automatic-speech-recognition)
Automatic Speech Recognition (automatic-speech-recognition)
For whisper models, we use Nvidia PyTriton to deploy these models.The output will be the transcription of the audio file.