Resources
Resources
For all deployments, we need to specify the resource constraints for the application so that it can be deployed accordingly on the cluster. The essential resources to be specified are:
CPU
CPU represents compute processing and is specified as a number. 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine.
We can also specify fractional CPUs like 0.2
or even 0.02
.
For CPU, we need to specify a cpu_request
and cpu_limit
param.
cpu_request
helps specify the amount of CPU that will always be reserved for the application. This also means that the minimum cost that you incur for this application is going to be the cost of cpu_request
number of CPUs. If the application always uses CPU lower than the cpu_request
, you can
assume it to run in a healthy way.
cpu_limit
helps specify the upper limit on cpu usage of the application, beyond which the application will be throttled and not allowed any more CPU usage. This helps safeguard other applications running on the same node since one misbehaving application cannot interfere and reduce the resources available to the other applications.
cpu_limit
has to be always greater than or equal to cpu_request
.
Google Cloud GKE Autopilot
For Google Cloud GKE Autopilot,
cpu_request
andcpu_limit
must be equal, otherwise a validation error will be raised. Check this page for more information
How do you set cpu_request
and cpu_limit
?
If your application is taking, let's say 0.5 CPU in steady state and during peak times goes to 0.8 CPU, then the request should be 0.5 and the limit can be 0.9 (just to be safe). In general, cpu_request should be somewhere around the steady state usage and the limit can account for the peak usage.
Memory
Memory is defined as an integer and the unit is Megabytes. So a value of 1 means 1 MB of memory and 1000 means 1GB of memory.
Memory also has two fields: memory_request
and memory_limit
. Memory request defines the minimum amount of memory needed to run the application. If you think that your app requires at least 256MB of memory to operate, this is the request value.
memory_limit
defines the max amount of memory that the application can use. If the application tries to use more memory, it will be killed, and OOM (Out of memory) error will appear on the pods.
If the memory usage of your application increases during peak or because of some other events, it is advisable to keep the memory limit around the peak memory usage.
Keeping memory limit below the usual memory usage will result in OOM killing of the pods.
memory_limit
has to be always greater than or equal to memory_request
.
Google Cloud GKE Autopilot
For Google Cloud GKE Autopilot,
memory_request
andmemory_limit
must be equal, otherwise a validation error will be raised. Check this page for more information
Storage
Storage is defined as an integer and the unit is Megabytes. A value of 1 means 1 MB of disk space and 1000 means 1GB of disk space.
Storage has two fields: ephemeral_storage_request
and ephemeral_storage_limit
. Ephemeral storage request defines the minimum amount of disk space your application needs to run. If 1GB of disk space is what your application requires, then that is the request value.
ephemeral_storage_limit
defines the max amount of disk space that the application will be allowed to use. Going beyond this limit will result in the application being killed with the pods being evicted.
The disk space being allocated to the application is completely ephemeral and is intended to be used purely as temporary space
ephemeral_storage_limit
has to be always greater than or equal to ephemeral_storage_request
.
Google Cloud GKE Autopilot
For Google Cloud GKE Autopilot,
ephemeral_storage_request
andephemeral_storage_limit
must be equal, otherwise a validation error will be raised. Check this page for more information
GPU
See GPUs page for more information
Setting resources for Truefoundry applications
# `Service` and `Job`, both have a `resource` argument where you can either pass an instance of the `Resources` class or a `dict`.
import logging
from servicefoundry import Build, Service, DockerFileBuild, Resources
logging.basicConfig(level=logging.INFO)
service = Service(
name="service",
image=Build(build_spec=DockerFileBuild()),
ports=[{"port": 8501}],
resources=Resources( # You can use this argument in `Job` too.
cpu_request=0.2,
cpu_limit=0.5,
memory_request=128,
memory_limit=512,
),
)
service.deploy(workspace_fqn="YOUR_WORKSPACE_FQN")
#You can defined the resource fields as a key-value pair under the `resources` field.
name: service
components:
- name: service
type: service
image:
type: build
build_source:
type: local
build_spec:
type: dockerfile
ports:
- port: 8501
resources: # You can use this block in `job` too.
cpu_request: 0.2
cpu_limit: 0.5
memory_request: 128
memory_limit: 512
We set the following defaults if you do not configure any resources field.
Field | Default value | Unit |
---|---|---|
cpu_request | 0.2 | - |
cpu_limit | 0.5 | - |
memory_request | 200 | MB |
memory_limit | 500 | MB |
ephemeral_storage_request | 1000 | MB |
ephemeral_storage_limit | 2000 | MB |
Migrating to newer Resources
spec
Resources
specStarting 22nd May 2023, we have added support for using nodepools. The Resources
spec is not entirely backwards compatible and would require changes if you are using any of the following fields:gpu
, capacity_type
, instance_family
, nodepools
.
Primarily we have added explicit distinction between how a node is picked for the application. We have added a new field node
to Resources
which takes one of two types
NodeSelector
- Allows defining constraints for dynamically and automatically provisioning a nodeNodepoolSelector
- Allows picking one or more predefined nodepools on the cluster
To use the new spec, install theservicefoundry>=0.9.0
package
pip install servicefoundry>=0.9.0
Note: Same changes apply to all
Service
,Job
andModelDeployment
spec
Resources.gpu
field has been moved and split intoResources.gpu_count
andResources.node.gpu_type
from servicefoundry import Service, Resources, GPUType
- from servicefoundry import GPU
+ from servicefoundry import NodeSelector
service = Service(
...,
resources=Resources(
...,
- gpu=GPU(type=GPUType.T4, count=2),
+ gpu_count=2,
+ node=NodeSelector(gpu_type=GPUType.T4),
)
)
name: my-service
type: service
...
resources:
...
- gpu:
- type: T4
- count: 2
+ gpu_count: 2
+ node:
+ type: node_selector
+ gpu_type: T4
If you want to assign GPUs using nodepools see the Adding GPUs page
Resources.instance_family
has moved and renamed toResources.node.instance_families
from servicefoundry import Service, Resources
+ from servicefoundry import NodeSelector
service = Service(
...,
resources=Resources(
...,
- instance_family=["c6i", "t3"],
+ node=NodeSelector(instance_families=["c6i", "t3"]),
)
)
name: my-service
type: service
...
resources:
...
- instance_family:
- - c6i
- - t3
+ node:
+ type: node_selector
+ instance_family:
+ - c6i
+ - t3
Resources.capacity_type
has moved toResources.node.capacity_type
from servicefoundry import Service, Resources, CapacityType
+ from servicefoundry import NodeSelector
service = Service(
...,
resources=Resources(
...,
- capacity_type=CapacityType.spot_fallback_on_demand,
+ node=NodeSelector(capacity_type=CapacityType.spot_fallback_on_demand),
)
)
name: my-service
type: service
...
resources:
...
- capacity_type: spot_fallback_on_demand
+ node:
+ type: node_selector
+ capacity_type: spot_fallback_on_demand
Resources.nodepools
have moved toResources.node.nodepools
from servicefoundry import Service, Resources, CapacityType
+ from servicefoundry import NodepoolSelector
service = Service(
...,
resources=Resources(
...,
- nodepools=["my-nodepool-1", "my-nodepool-2"]
+ node=NodepoolSelector(nodepools=["my-nodepool-1", "my-nodepool-2"]),
)
)
name: my-service
type: service
...
resources:
...
- nodepools:
- - my-nodepool-1
- - my-nodepool-2
+ node:
+ type: nodepool_selector
+ nodepools:
+ - my-nodepool-1
+ - my-nodepool-2
Updated 4 months ago