Nvidia MIG and Nvidia TimeSlicing
Fractional GPUs enable us to allocate multiple workloads to a single GPU which can be useful in the following scenarios:
There are two ways to use fractional GPUs:
A brief comparison chart between TimeSlicing and MIG is as follows:
Feature | TimeSlicing | MIG (Multi-Instance GPU) |
---|---|---|
GPU Support | Works on most GPUs. | Only supported on NVIDIA A100 and H100 GPUs. |
Isolation | No real isolation. User is responsible for memory management. Potential for crashes if one workload exceeds its allocated memory. | Strong Isolation. Compute and memory are isolated between instances. Guaranteed resource allocation. |
Resource Allocation | Divides GPU into fractional parts (e.g., 0.3, 0.5, 0.2). Workloads can use these fractional parts. | Divides GPU into pre-defined, discrete instance types (as per NVIDIA’s configurations). Workloads are assigned entire instances. |
VRAM Management | User-managed. VRAM allocation is not enforced by the hardware. | Hardware-enforced. Each instance has dedicated VRAM. |
Compute Sharing | Compute is shared via context-switching. Workloads can potentially use the entire GPU when others are idle. | Compute is partitioned and isolated to each instance. No sharing of compute resources beyond the instance’s allocation. |
Flexibility | More flexible in terms of resource allocation fractions (e.g., can request 0.3, 0.5, etc.). | Limited to NVIDIA’s pre-defined instance types. Less flexible in terms of fine-grained resource requests. |
Create a Nodepool with MIG enabled
We will need to create a separate nodepool for MIG enabled GPUs. Every GPU has different MIG profiles as mentioned in this page: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html. For e.g. here are the MIG profiles for A100 GPU:
GPU | GPU Compute Fraction / Instance | Number of instances per GPU | GPU Memory / Instance | Configuration Name | GPU Instance Profile (for Azure) |
---|---|---|---|---|---|
A100 (40GB) | 1/7 | 7 | 5GB | 1g.5gb | MIG1g |
A100 (40GB) | 2/7 | 3 | 10GB | 2g.10gb | MIG2g |
A100 (40GB) | 3/7 | 2 | 20GB | 3g.20gb | MIG3g |
A100 (80GB) | 1/7 | 7 | 10GB | 1g.10gb | MIG1g |
A100 (80GB) | 2/7 | 3 | 20GB | 2g.20gb | MIG2g |
A100 (80GB) | 3/7 | 2 | 40GB | 3g.40gb | MIG3g |
While creating the nodepool, we will need to select the MIG profile. Here are the steps to do it in different cloud providers:
Create a Nodepool with MIG enabled using the argument --gpu-instance-profile
of Azure CLI.
Create a Nodepool with MIG enabled using the argument --gpu-instance-profile
of Azure CLI.
Create a nodepool and pass the mig_profile in accelerator
by passing gpu_partition_size=1g.5gb
[OR one of the allowed values for MIG profile values mentioned above]
It is not trivial to currently support MIG GPUs on AWS in a managed way, although if you want to try the feature out -> Please refer to these docs
Deploy your workload on the MIG nodepool
Once you have created the nodepool, you will be able to see the MIG nodepool in the available nodepools in Resources
section in deployment.
It might take 10 mins for the newly created nodepool to be visible on the Truefoundry UI. You can force sync the nodepools by going to Platform -> Clusters -> Sync Cluster.
To deploy a workload that utilizes fractional GPU, start deploying your service/job on truefoundry and in the “Resources” section, select nodepool selector. You can now see the Fractional GPUs on the UI which you can select (as shown below)
Using MIG GPU
Create a Nodepool with Timeslicing enabled
Create a Nodepool with device-plugin.config
pointing to the correct time-slicing config with Azure CLI.
Create a Nodepool with device-plugin.config
pointing to the correct time-slicing config with Azure CLI.
Here’s the command to create a nodepool with timeslicing enabled on GCP.
Create nodegroup on AWS EKS with the following label:
Deploy your workload on the Timeslicing nodepool
Once you have created the nodepool, you will be able to see the timesliced nodepool in the available nodepools in Resources
section in deployment.
It might take 10 mins for the newly created nodepool to be visible on the Truefoundry UI. You can force sync the nodepools by going to Platform -> Clusters -> Sync Cluster.
To deploy a workload that utilizes fractional GPU, start deploying your service/job on truefoundry and in the “Resources” section, select nodepool selector. You can now see the timesliced GPUs on the UI which you can select (as shown below)
Using Timeslicing GPU
Nvidia MIG and Nvidia TimeSlicing
Fractional GPUs enable us to allocate multiple workloads to a single GPU which can be useful in the following scenarios:
There are two ways to use fractional GPUs:
A brief comparison chart between TimeSlicing and MIG is as follows:
Feature | TimeSlicing | MIG (Multi-Instance GPU) |
---|---|---|
GPU Support | Works on most GPUs. | Only supported on NVIDIA A100 and H100 GPUs. |
Isolation | No real isolation. User is responsible for memory management. Potential for crashes if one workload exceeds its allocated memory. | Strong Isolation. Compute and memory are isolated between instances. Guaranteed resource allocation. |
Resource Allocation | Divides GPU into fractional parts (e.g., 0.3, 0.5, 0.2). Workloads can use these fractional parts. | Divides GPU into pre-defined, discrete instance types (as per NVIDIA’s configurations). Workloads are assigned entire instances. |
VRAM Management | User-managed. VRAM allocation is not enforced by the hardware. | Hardware-enforced. Each instance has dedicated VRAM. |
Compute Sharing | Compute is shared via context-switching. Workloads can potentially use the entire GPU when others are idle. | Compute is partitioned and isolated to each instance. No sharing of compute resources beyond the instance’s allocation. |
Flexibility | More flexible in terms of resource allocation fractions (e.g., can request 0.3, 0.5, etc.). | Limited to NVIDIA’s pre-defined instance types. Less flexible in terms of fine-grained resource requests. |
Create a Nodepool with MIG enabled
We will need to create a separate nodepool for MIG enabled GPUs. Every GPU has different MIG profiles as mentioned in this page: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html. For e.g. here are the MIG profiles for A100 GPU:
GPU | GPU Compute Fraction / Instance | Number of instances per GPU | GPU Memory / Instance | Configuration Name | GPU Instance Profile (for Azure) |
---|---|---|---|---|---|
A100 (40GB) | 1/7 | 7 | 5GB | 1g.5gb | MIG1g |
A100 (40GB) | 2/7 | 3 | 10GB | 2g.10gb | MIG2g |
A100 (40GB) | 3/7 | 2 | 20GB | 3g.20gb | MIG3g |
A100 (80GB) | 1/7 | 7 | 10GB | 1g.10gb | MIG1g |
A100 (80GB) | 2/7 | 3 | 20GB | 2g.20gb | MIG2g |
A100 (80GB) | 3/7 | 2 | 40GB | 3g.40gb | MIG3g |
While creating the nodepool, we will need to select the MIG profile. Here are the steps to do it in different cloud providers:
Create a Nodepool with MIG enabled using the argument --gpu-instance-profile
of Azure CLI.
Create a Nodepool with MIG enabled using the argument --gpu-instance-profile
of Azure CLI.
Create a nodepool and pass the mig_profile in accelerator
by passing gpu_partition_size=1g.5gb
[OR one of the allowed values for MIG profile values mentioned above]
It is not trivial to currently support MIG GPUs on AWS in a managed way, although if you want to try the feature out -> Please refer to these docs
Deploy your workload on the MIG nodepool
Once you have created the nodepool, you will be able to see the MIG nodepool in the available nodepools in Resources
section in deployment.
It might take 10 mins for the newly created nodepool to be visible on the Truefoundry UI. You can force sync the nodepools by going to Platform -> Clusters -> Sync Cluster.
To deploy a workload that utilizes fractional GPU, start deploying your service/job on truefoundry and in the “Resources” section, select nodepool selector. You can now see the Fractional GPUs on the UI which you can select (as shown below)
Using MIG GPU
Create a Nodepool with Timeslicing enabled
Create a Nodepool with device-plugin.config
pointing to the correct time-slicing config with Azure CLI.
Create a Nodepool with device-plugin.config
pointing to the correct time-slicing config with Azure CLI.
Here’s the command to create a nodepool with timeslicing enabled on GCP.
Create nodegroup on AWS EKS with the following label:
Deploy your workload on the Timeslicing nodepool
Once you have created the nodepool, you will be able to see the timesliced nodepool in the available nodepools in Resources
section in deployment.
It might take 10 mins for the newly created nodepool to be visible on the Truefoundry UI. You can force sync the nodepools by going to Platform -> Clusters -> Sync Cluster.
To deploy a workload that utilizes fractional GPU, start deploying your service/job on truefoundry and in the “Resources” section, select nodepool selector. You can now see the timesliced GPUs on the UI which you can select (as shown below)
Using Timeslicing GPU