Deploying NVIDIA NIM Models
Deploy optimized TensorRT-LLM Engines using NIM Containers
Supported Model Types
Currently we list NIM models of following types
- Large Language Models (LLMs)
- Vision Language Models (VLMs)
- Embedding Models
- Reranking Models
Adding nvcr.io
Docker Registry
-
Generate an API Key from https://org.ngc.nvidia.com/setup/api-keys
Make sure to give it access to
NGC Catalog
-
Add a Custom Docker Registry to the Platform
- Registry URL:
nvcr.io
- Username:
$oauthtoken
- Password: The API Key from the previous step
- Registry URL:
Adding NGC API Key to Secrets
-
Add the same API Key as a Secret on the Platform. We are calling the secret
NGC_API_KEY
Deploying a NIM Model
-
From
New Deployment
page, selectNVIDIA NIM
.- Select the workspace you want to deploy to
- Select the NVCR Model Registry Integration we created in the previous step
- Select the NGC API Key Secret we created in the previous step
- Select the model you want to deploy
-
Click Next. You will be presented with optimized profiles (for
latency
orthroughput
) for differerent precision and GPU options for which TRT-LLM engines are prebuilt and available. You can select any of the profile and Continue to Deployment.
(Optional) Caching NIM Model to External Volume
Recommended for Large Models and Production Environments
To avoid re-downloading Model on every restart, you can Create a Volume and Mount the Volume at /opt/nim/.cache
Running Inferences
You can now run inferences via the OpenAPI tab. You can also Add the Model to LLM Gateway using the button on the top