Supported Model TypesCurrently we list NIM models of following types
  • Large Language Models (LLMs)
  • Vision Language Models (VLMs)
  • Embedding Models
  • Reranking Models

Adding nvcr.io Docker Registry

  1. Generate an API Key from https://org.ngc.nvidia.com/setup/api-keys Make sure to give it access to NGC Catalog image.png
  2. Add a Custom Docker Registry to the Platform
    • Registry URL: nvcr.io
    • Username: $oauthtoken
    • Password: The API Key from the previous step
    image.png

Adding NGC API Key to Secrets

  1. Add the same API Key as a Secret on the Platform. We are calling the secret NGC_API_KEY Add Ngc Api Key As Secret Pn

Deploying a NIM Model

  1. From New Deployment page, select NVIDIA NIM.
    • Select the workspace you want to deploy to
    • Select the NVCR Model Registry Integration we created in the previous step
    • Select the NGC API Key Secret we created in the previous step
    • Select the model you want to deploy
    image.png
  2. Click Next. You will be presented with optimized profiles (for latency or throughput ) for differerent precision and GPU options for which TRT-LLM engines are prebuilt and available. You can select any of the profile and Continue to Deployment. Nim Deployment Options Pn

(Optional) Caching NIM Model to External Volume

Recommended for Large Models and Production Environments
To avoid re-downloading Model on every restart, you can Create a Volume and Mount the Volume at /opt/nim/.cache 4fcd568c 5d90c572d92dd8c09ec0590fba11f5058f486f2d663594b95086fec59f074340 Image Pn

Running Inferences

You can now run inferences via the OpenAPI tab. You can also Add the Model to LLM Gateway using the button on the top 84390370 9d4356719c8ce8f41dc04108e037fab0f980d78a06cbbb8f5d85f55bac0d5887 Image Pn