Self-hosted Models

TrueFoundry offers a solution to deploy open-source or private models and integrate them directly into the Gateway. This capability also extends to fine-tuned models and embedding models.

Refer to our docs on Deploying LLMs From Model Catalog for how to deploy LLMs from our model catalogue in a few clicks.

Tryout Self-hosted Model

Follow these steps once you have your model deployed:

Save costs on Self-hosted Model

GPUs are costly, and running GPU-powered deployments can quickly drive up expenses. TrueFoundry offers built-in solutions to help reduce deployment costs.

You can choose from two cost-saving options:

  1. Auto-shutdown based on inactivity: Set up automatic shutdown of deployment pods if no new requests are received within a specified time frame.
  2. Manual pause/resume: You can manually pause deployments that you anticipate won't be used for a certain period.

In both scenarios, the model status will be displayed as "paused" on the Gateway, as shown below: