Architecture
Truefoundry comprises of an architecture with a central control plane to which you can connect multiple Kubernetes clusters. The Truefoundry architecture looks like following:
Truefoundry comprises of a control plane that helps manage mulitple Kubernetes clusters. These clusters can be in any cloud provider, across VPCs, etc. The datascientists and developers interact with a web UI hosted at the control plane that manages all the deployments and machine learning metadata. There is a central authentication server hosted on Truefoundry cloud to which the control plane communicates the users' login and usage information.
Control Plane installation
Control plane is the sole heart of Truefoundry SAAS platform which helps in managing all the clusters, deployments, models, registries etc. There are two ways through which you can interact with the control plane
Truefoundry Hosted Control plane
In this case, the control plane is hosted by the Truefoundry and you connect your own Kubernetes clusters to the control plane. Upgrades are managed by the Truefoundry team and no other infrastructure is needed from your end to onboard onto Truefoundry.
Self hosted Control plane
In this case the Truefoundry control plane is also deployed on the customer's cloud. All the compute, data and the UI lives on your own cloud account. The authentication server stays in the Truefoundry cloud and the controlplane communicates the license and authentication information to the auth server.
Data flow
In case of Self-hosted control plane no user and customer data flows out of the cloud in this case.
Pros:
Complete protection of data and nothing flows out of your cloud.
Cons:
You incur the cost of hosting the control plane and this will require manual upgrades to the control plane from your end.
Data plane installation
Data plane is the kubernetes cluster that entirely resides in client's cloud or datacenter. Anything that gets deployed from the control plane will be in turn deployed in the workload cluster which includes service deployments, models, notebooks etc.
On the data plane Truefoundry supports
- Bring your own kubernetes cluster - Client is allowed to bring their own cluster and attach it to the control plane. During this process few applications will be installed in your cluster including the Truefoundry's agent which is responsible to connect your data plane to the control plane.
- Support for creating a new cluster in your favourite cloud - Truefoundry supports bootstrapping of a cluster in AWS, GCP and Azure with the help of dedicated IaC (
terragrunt
/terraform
) tools. During this, we require few inputs for bootstrapping your cluster to achieve the desired state.
Components
Truefoundry comprises of the following functionalities:
Experiment Tracking and ML Metadata Store (MLFoundry)
MlFoundry is used to log models, datasets, metrics, params related to model training and helps maintain lineage between the runs and artifacts.
ML Training and Inference Deployment (ServiceFoundry)
ServiceFoundry helps data scientists and machine learning engineers to deploy jobs and services on Kubernetes. It also provides an internal developer
platform to view all the deployed services along with the cost and manage permission and access control.
Truefoundry Dashboard:
This is a dashboard which helps you view the data from MlFoundry, Servicefoundry and ML-Monitoring in one place.
Tfy-agent
This component sits on each of the workload cluster and helps communicate with the central control plane.
Client Libraries:
We have two client libraries for datascientists, engineers and devops to interact with the services mentioned above. The two libraries are:
- mlfoundry (
pip install mlfoundry
) - servicefoundry (
pip install servicefoundry
)
All the above functionalities can be installed independently or together depending on your requirements.
Deploying on your own cloud
Find the below links for documents on how to create your own cluster in specific cloud environments
Installing Applications for your cloud
Once your cluster is connected to the control plane UI, applications must be installed so that features can be enabled. These features can include Autoscaling, Enabling file system support for volumes, GPU support etc.
As all these features are very specific to cloud we have tried to bundle certain applications which are specific for each cloud provider. Moreover there are certain applications which are vendor neutral as well and should be installed wherever mentioned. Below is the list of cloud provider with their set of applications. These pages also include the custom appliactions which are vendor neutral.
- Azure Applications
- AWS Applications
- GCP Applications
Updated 3 months ago