Deployments Additional Configuration

Note

While using ServiceFoundry python SDK type is not a required field in any of the imported classes
For deployments we can use the modules below to add necessary functionalities

Resources

Description

Describes the resource constraints for the application so that it can be deployed accordingly on the cluster
To learn more you can go here

Schema

{
    "cpu_request": 0.2,
    "cpu_limit": 0.5,
    "memory_request": 200,
    "memory_limit": 500,
    "ephemeral_storage_request": 1000,
    "ephemeral_storage_limit": 2000,
    "instance_family": ["string"]
}

Properties

Name	Type	Required	Description
cpu_request	number	true	Requested CPU which determines the minimum cost incurred. The CPU usage can exceed the requested amount, but not the value specified in the limit. 1 CPU means 1 CPU core. Fractional CPU can be requested like `0.5` or `0.05`
cpu_limit	number	true	CPU limit beyond which the usage cannot be exceeded. 1 CPU means 1 CPU core. Fractional CPU can be requested like `0.5`. CPU limit should be >= cpu request.
memory_request	number	true	Requested memory which determines the minimum cost incurred. The unit of memory is in megabytes(MB). So 1 means 1 MB and 2000 means 2GB.
memory_limit	number	true	Memory limit after which the application will be killed with an OOM error. The unit of memory is in megabytes(MB). So 1 means 1 MB and 2000 means 2GB. MemoryLimit should be greater than memory request.
ephemeral_storage_request	number	true	Requested disk storage. The unit of memory is in megabytes(MB). This is ephemeral storage and will be wiped out on pod restarts or eviction
ephemeral_storage_limit	number	true	Disk storage limit. The unit of memory is in megabytes(MB). Exceeding this limit will result in eviction. It should be greater than the request. This is ephemeral storage and will be wiped out on pod restarts or eviction
instance_family	[string]	false	Instance family of the underlying machine to use. Multiple instance families can be supplied. The workload is guaranteed to be scheduled on one of them.

Python Examples

from servicefoundry import Service, Resources

service = Service(  # or Job or ModelDeployment
    ...
    resources=Resources(
        cpu_request=1,
        memory_request=1000, # in Megabytes
        ephemeral_storage_request=1000, # in Megabytes
        cpu_limit=4,
        memory_limit=4000,
        ephemeral_storage_limit=10000,
        instance_family=["c6i", "t3", "m4"],
  	)
)

FileMount

Description

Describes the configuration for FileMount

Schema

{
    "mount_dir": "string",
    "data": {
        "property1": "string",
        "property2": "string"
    }
}

Properties

Name	Type	Required	Description
mount_dir	string	true	Dir at which data is to be mounted
data	object	true	Data to be mounted, the key will be the filename, and the value will be the file content. Files will be mounted under mount_dir

Autoscaling

Description

Describes the configuration for Autoscaling

Schema

{
    "min_replicas": 1,
    "max_replicas": 1,
    "metrics": {},
    "polling_interval": 30,
    "cooldown_period": 300
}

Properties

Name	Type	Required	Restrictions	Description
min_replicas	integer	true	none	Minimum number of replicas to keep available
max_replicas	integer	true	none	Maximum number of replicas allowed for the component.
metrics	[CPUUtilizationMetric	RPSMetric	CronMetric]	true	none	Metrics to use for the autoscaler
polling_interval	integer	true	none	This is the interval to check each trigger on.
cooldown_period	integer	true	none	The period to wait after the last trigger reported active before scaling the resource back to 0.

CPUUtilizationMetric

Schema

{
    "type": "cpu_utilization",
    "value": 0
}

Properties

Name	Type	Required	Restrictions	Description
type	string	true	none	none
value	integer	true	none	Percentage of cpu request averaged over all replicas which the autoscaler should try to maintain

RPSMetric

Schema

{
    "type": "rps",
    "value": 1
}

Properties

Name	Type	Required	Restrictions	Description
type	string	true	none	none
value	integer	true	none	Average request per second averaged over all replicas that autoscaler should try to maintain

CronMetric

Schema

{
    "type": "cron",
    "desired_replicas": 1,
    "start": "string",
    "end": "string",
    "timezone": "UTC"
}

Properties

Name	Type	Required	Restrictions	Description
type	string	true	none	none
desired_replicas	integer	false	none	Desired number of replicas during the given interval. Default value is max_replicas.
start	string	true	none	Cron expression indicating the start of the cron schedule.
end	string	true	none	Cron expression indicating the end of the cron schedule.
timezone	string	true	none	Timezone against which the cron schedule will be calculated, e.g. “Asia/Tokyo”. Default is machine’s local time. https://docs.truefoundry.com/docs/list-of-supported-timezones

LLM Deployment

LLM Finetuning

Prompt Management

LLM Tracing

Deployments Additional Configuration

Note

Resources

Description

Schema

Properties

Python Examples

FileMount

Description

Schema

Properties

Autoscaling

Description

Schema

Properties

CPUUtilizationMetric

Schema

Properties

RPSMetric

Schema

Properties

CronMetric

Schema

Properties

LLM Deployment

LLM Finetuning

Prompt Management

LLM Tracing

​Note

​Resources

​Description

​Schema

​Properties

​Python Examples

​FileMount

​Description

​Schema

​Properties

​Autoscaling

​Description

​Schema

​Properties

​CPUUtilizationMetric

​Schema

​Properties

​RPSMetric

​Schema

​Properties

​CronMetric

​Schema

​Properties

Note

Resources

Description

Schema

Properties

Python Examples

FileMount

Description

Schema

Properties

Autoscaling

Description

Schema

Properties

CPUUtilizationMetric

Schema

Properties

RPSMetric

Schema

Properties

CronMetric

Schema

Properties