Overview
There are different ways to deploy the models as an API - depending on the framework / type of model.
Model Deployment Options
TrueFoundry doesn’t provide any client side framework for model deployment. We believe there are a lot of great open-source frameworks to build inference services and we don’t want to build yet another one. This also helps avoid any vendor lock-in with TrueFoundry since you don’t need to change your code to deploy models on TrueFoundry or migrate to another platform.
TrueFoundry supports deploying models from different frameworks. You can deploy models from HuggingFace, or any custom models that you have logged in the TrueFoundry model registry or your own custom inference code in any of the frameworks. All model API deployments in TrueFoundry are abstractions on top of the Service Deployment feature. That’s why its highly recommended to get familiar with service deployment first.
Bring your own inference code and model in any framework
TrueFoundry can deploy model inference code in any framework that you are using. Here are a few examples of deploying models in most commonly used frameworks and model servers:
HuggingFace
Deploy Transformers / Diffusers models with vLLM, SGLang, Nvidia Triton, etc.
Scikit Learn & XGBoost
Deploy Scikit Learn and XGBoost models with FastAPI or Nvidia PyTriton.
FastAPI
Most flexible option that can wrap any inference code.
LitServe
Wrap any model with LitServe with optional features like dynamic batching and advanced features.
AWS Multi Model Server
Deploy models with AWS Multi Model Server.
TorchServe
Deploy Pytorch models with TorchServe.
TensorFlow Serve
Deploy TensorFlow models with TensorFlow Serve.
Mlflow Serve
Deploy Mlflow models with Mlflow Serve.