GPUs (Preview)

📘

Availability

GPUs are currently only supported on AWS cloud installations. Currently the following instance families are supported:

  • P2 (Nvidia K80)
  • P3 (Nvidia V100)
  • G4dn (Nvidia T4)
  • G5 g5 (Nvidia A10G)

These instances can be configured to be provisioned as spot instances or on-demand. GPU Availability is subject to quota and region limitations applied to your respective cloud accounts.

Support for Google Cloud and Microsoft Azure will be added in upcoming weeks.

Service and Job can now easily use Nvidia GPUs for accelerated machine learning training and inference workloads.

Note: Currently only a full single GPU can be alloted to any Service or Job. Please reach out to us if you have fractional or multi-gpu requirements

Generative AI Examples

  1. Stable Diffusion v2.1 Gradio app
  2. Running GPT-J-6B LLM in fp16 mode in a FastAPI app

Adding GPU to Service or Job

A GPU can be easily attached by passing in servicefoundry.GPU to Resources part of the Service or Job definition like the following:

import logging

from servicefoundry import (Build, Port, PythonBuild, Resources, Service, LocalSource)
+ from servicefoundry import GPU, GPUType

logging.basicConfig(level=logging.INFO, format=logging.BASIC_FORMAT)

service = Service(
    name="stable-diffusion-v21",
    image=Build(
        build_spec=PythonBuild(
            python_version="3.8",
            requirements_path="requirements.txt",
            command="python app.py"
        ),
    ),
    ports=[Port(port=8080)],
    resources=Resources(
        cpu_request=3.5,
        cpu_limit=3.5,
        memory_request=14500,
        memory_limit=14500,
        ephemeral_storage_request=50000,
        ephemeral_storage_limit=50000,
+       gpu=GPU(type=GPUType.T4)
    )
)

service.deploy(workspace_fqn="...", wait=False)
name: stable-diffusion-v21
type: service
image:
  type: build
  build_spec:
    type: tfy-python-buildpack
    command: python app.py
    python_version: '3.8'
    requirements_path: requirements.txt
    build_context_path: ./
  build_source:
    type: local
ports:
  - port: 8080
    expose: true
    protocol: TCP
replicas: 1
resources:
+ gpu:
+   type: T4
  cpu_limit: 3.5
  cpu_request: 3.5
  memory_limit: 14500
  memory_request: 14500
  ephemeral_storage_limit: 50000
  ephemeral_storage_request: 50000

Supported GPU types are:

  • GPUType.K80
  • GPUType.V100
  • GPUType.T4
  • GPUType.A10G

Adding CUDA Toolkit

Additionally, your application might want to have CUDA toolkit installed. If you are using PythonBuild, you can configure it simply by passing it as cuda_version

import logging

from servicefoundry import (Build, Port, PythonBuild, Resources, Service, LocalSource)
+ from servicefoundry import GPU, GPUType
+ from servicefoundry import CUDAVersion

logging.basicConfig(level=logging.INFO, format=logging.BASIC_FORMAT)

service = Service(
    name="stable-diffusion-v21",
    image=Build(
        build_spec=PythonBuild(
            python_version="3.8",
+           cuda_version=CUDAVersion.CUDA_11_3_CUDNN8
            requirements_path="requirements.txt",
            command="python app.py"
        ),
    ),
    ports=[Port(port=8080)],
    resources=Resources(
        cpu_request=3.5,
        cpu_limit=3.5,
        memory_request=14500,
        memory_limit=14500,
        ephemeral_storage_request=50000,
        ephemeral_storage_limit=50000,
+       gpu=GPU(type=GPUType.T4)
    )
)

service.deploy(workspace_fqn="...", wait=False)
name: stable-diffusion-v21
type: service
image:
  type: build
  build_spec:
    type: tfy-python-buildpack
    command: python app.py
    python_version: '3.8'
    requirements_path: requirements.txt
    build_context_path: ./
+   cuda_version: 11.3-cudnn8
  build_source:
    type: local
ports:
  - port: 8080
    expose: true
    protocol: TCP
replicas: 1
resources:
+ gpu:
+   type: T4
  cpu_limit: 3.5
  cpu_request: 3.5
  memory_limit: 14500
  memory_request: 14500
  ephemeral_storage_limit: 50000
  ephemeral_storage_request: 50000

Check servicefoundry.CUDAVersion enum for all available CUDA versions and cuDNN versions.

Alternatively, you can bring your own docker image or Dockerfile which has CUDA Toolkit pre-installed.

Configuring GPU and CUDA Version from UI

Select CUDA Version from the dropdown

Select CUDA Version from the dropdown

Select GPU Type from the dropdown

Select GPU Type from the dropdown

📘

GPU Type Options

The GPU Type dropdown will only show GPUs that are available on instance families allowed on the selected workspace (by default all).

You can restrict the GPU instance types by editing the respective workspace.

Monitoring GPU Metrics

GPU Metrics are automatically captured and available in the Metrics section of your Application