Resources

Resources

For all deployments, we need to specify the resource constraints for the application so that it can be deployed accordingly on the cluster. The essential resources to be specified are:

CPU

CPU represents compute processing and is specified as a number. 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine.
We can also specify fractional CPUs like 0.2 or even 0.02.

For CPU, we need to specify a cpu_request and cpu_limit param.

cpu_request helps specify the amount of CPU that will always be reserved for the application. This also means that the minimum cost that you incur for this application is going to be the cost of cpu_request number of CPUs. If the application always uses CPU lower than the cpu_request, you can
assume it to run in a healthy way.

cpu_limit helps specify the upper limit on cpu usage of the application, beyond which the application will be throttled and not allowed any more CPU usage. This helps safeguard other applications running on the same node since one misbehaving application cannot interfere and reduce the resources available to the other applications.

cpu_limit has to be always greater than or equal to cpu_request.

📘

Google Cloud GKE Autopilot

For Google Cloud GKE Autopilot, cpu_request and cpu_limit must be equal, otherwise a validation error will be raised. Check this page for more information

How do you set cpu_request and cpu_limit ?

If your application is taking, let's say 0.5 CPU in steady state and during peak times goes to 0.8 CPU, then the request should be 0.5 and the limit can be 0.9 (just to be safe). In general, cpu_request should be somewhere around the steady state usage and the limit can account for the peak usage.

Memory

Memory is defined as an integer and the unit is Megabytes. So a value of 1 means 1 MB of memory and 1000 means 1GB of memory.
Memory also has two fields: memory_request and memory_limit. Memory request defines the minimum amount of memory needed to run the application. If you think that your app requires at least 256MB of memory to operate, this is the request value.

memory_limit defines the max amount of memory that the application can use. If the application tries to use more memory, it will be killed, and OOM (Out of memory) error will appear on the pods.

If the memory usage of your application increases during peak or because of some other events, it is advisable to keep the memory limit around the peak memory usage.

Keeping memory limit below the usual memory usage will result in OOM killing of the pods.

memory_limit has to be always greater than or equal to memory_request.

📘

Google Cloud GKE Autopilot

For Google Cloud GKE Autopilot, memory_request and memory_limit must be equal, otherwise a validation error will be raised. Check this page for more information

Storage

Storage is defined as an integer and the unit is Megabytes. A value of 1 means 1 MB of disk space and 1000 means 1GB of disk space.

Storage has two fields: ephemeral_storage_request and ephemeral_storage_limit. Ephemeral storage request defines the minimum amount of disk space your application needs to run. If 1GB of disk space is what your application requires, then that is the request value.

ephemeral_storage_limit defines the max amount of disk space that the application will be allowed to use. Going beyond this limit will result in the application being killed with the pods being evicted.

The disk space being allocated to the application is completely ephemeral and is intended to be used purely as temporary space

ephemeral_storage_limit has to be always greater than or equal to ephemeral_storage_request.

📘

Google Cloud GKE Autopilot

For Google Cloud GKE Autopilot, ephemeral_storage_request and ephemeral_storage_limit must be equal, otherwise a validation error will be raised. Check this page for more information

GPU

See GPUs page for more information

Setting resources for Truefoundry applications

# `Service` and `Job`, both have a `resource` argument where you can either pass an instance of the `Resources` class or a `dict`.

import logging

from servicefoundry import Build, Service, DockerFileBuild, Resources

logging.basicConfig(level=logging.INFO)
service = Service(
    name="service",
    image=Build(build_spec=DockerFileBuild()),
    ports=[{"port": 8501}],
    resources=Resources( # You can use this argument in `Job` too.
        cpu_request=0.2,
        cpu_limit=0.5,
        memory_request=128,
        memory_limit=512,
    ),
)
service.deploy(workspace_fqn="YOUR_WORKSPACE_FQN")
#You can defined the resource fields as a key-value pair under the `resources` field.

name: service
components:
  - name: service
    type: service
    image:
      type: build
      build_source:
        type: local
      build_spec:
        type: dockerfile
    ports:
     - port: 8501
    resources: # You can use this block in `job` too.
      cpu_request: 0.2
      cpu_limit: 0.5
      memory_request: 128
      memory_limit: 512

We set the following defaults if you do not configure any resources field.

FieldDefault valueUnit
cpu_request0.2-
cpu_limit0.5-
memory_request200MB
memory_limit500MB
ephemeral_storage_request1000MB
ephemeral_storage_limit2000MB

Migrating to newer Resources spec

Starting 22nd May 2023, we have added support for using nodepools. The Resources spec is not entirely backwards compatible and would require changes if you are using any of the following fields:gpu, capacity_type, instance_family, nodepools.

Primarily we have added explicit distinction between how a node is picked for the application. We have added a new field node to Resources which takes one of two types

  • NodeSelector - Allows defining constraints for dynamically and automatically provisioning a node
  • NodepoolSelector - Allows picking one or more predefined nodepools on the cluster

To use the new spec, install theservicefoundry>=0.9.0 package

pip install servicefoundry>=0.9.0

Note: Same changes apply to all Service, Job and ModelDeployment spec

  • Resources.gpu field has been moved and split into Resources.gpu_count and Resources.node.gpu_type
from servicefoundry import Service, Resources, GPUType
- from servicefoundry import GPU
+ from servicefoundry import NodeSelector


service = Service(
    ...,
    resources=Resources(
        ...,
-       gpu=GPU(type=GPUType.T4, count=2),
+       gpu_count=2,
+       node=NodeSelector(gpu_type=GPUType.T4),
    )
)
name: my-service
type: service
...
resources:
  ...
- gpu:
-   type: T4
-   count: 2
+ gpu_count: 2
+ node:
+   type: node_selector
+   gpu_type: T4

If you want to assign GPUs using nodepools see the Adding GPUs page

  • Resources.instance_family has moved and renamed to Resources.node.instance_families
from servicefoundry import Service, Resources
+ from servicefoundry import NodeSelector


service = Service(
    ...,
    resources=Resources(
        ...,
-       instance_family=["c6i", "t3"],
+       node=NodeSelector(instance_families=["c6i", "t3"]),
    )
)
name: my-service
type: service
...
resources:
  ...
- instance_family:
-   - c6i
-   - t3
+ node:
+   type: node_selector
+   instance_family:
+     - c6i
+     - t3
  • Resources.capacity_type has moved to Resources.node.capacity_type
from servicefoundry import Service, Resources, CapacityType
+ from servicefoundry import NodeSelector 


service = Service(
    ...,
    resources=Resources(
        ...,
-       capacity_type=CapacityType.spot_fallback_on_demand,
+       node=NodeSelector(capacity_type=CapacityType.spot_fallback_on_demand),
    )
)
name: my-service
type: service
...
resources:
  ...
- capacity_type: spot_fallback_on_demand
+ node:
+   type: node_selector
+   capacity_type: spot_fallback_on_demand
  • Resources.nodepools have moved to Resources.node.nodepools
from servicefoundry import Service, Resources, CapacityType
+ from servicefoundry import NodepoolSelector 


service = Service(
    ...,
    resources=Resources(
        ...,
-       nodepools=["my-nodepool-1", "my-nodepool-2"]
+       node=NodepoolSelector(nodepools=["my-nodepool-1", "my-nodepool-2"]),
    )
)
name: my-service
type: service
...
resources:
  ...
- nodepools:
-   - my-nodepool-1
-   - my-nodepool-2
+ node:
+   type: nodepool_selector
+   nodepools:
+     - my-nodepool-1
+     - my-nodepool-2