Truefoundry Docs

Async Service in TrueFoundry allows us to put input requests in a queue, process them asynchronously and then put the outputs in another queue. You can use an async service if one or more of the following scenarios applies to your use case:

Large payload size: If the input to your service / model is really large, it might be useful to put the payload in S3 or some blob storage and then trigger the async service to process the payload. The async service can then download the S3 object and do the processing.
Large processing time: If the time takes to respond to a request is high(from seconds to minutes), it almost becomes essential to use an async service since HTTP requests in a normal service can start timing out.
Scale to 0: You can scale an async service to 0 if there are no items in the queue.
High reliability in case of traffic surges: If you are using a simple HTTP service and traffic suddenly rises, your service will start throwing 5XX errors in case autoscaling is slow. Async service offers a higher reliability since the messages will be stored in the queue and processed when the service scales up.

To build an async service, we need to provision an input queue where we will be writing the messages. TrueFoundry supports the following queues for input queues. Each of the queue points to a page on how you can provision the queue.

AWS SQS
Nats queue
Kafka
Google AMQP

You can consume messages from the queue either via our async library or via a sidecar.

Use async library

Since an async service consumes messages from a queue, we will need the logic to pull and push items to the queue. While you can write this logic in your own code, it can be cumbersome and you will have to change your code if you ever decide to switch the queue system underneath. To overcome this, TrueFoundry provides a simple open-source Python library that integrates with different queue frameworks and you just need to implement the processing handler. You can follow the README in the Github repository to integrate it into your codebase.

Architecture:

For all other use cases, the approach provided below using a sidecar is recommended.

Use sidecar

In this case, the python library mentioned above is packaged into another docker image and runs alongside your main code in a separate container as a sidecar. You will need to write your processing code as HTTP service.

Architecture

The tfy-async-sidecar component consumes items from the queue and calls your HTTP service with the payload as a POST request. The tfy-async sidecar code is open source and available here. It has adapters to consume the messages from the different types of queue as mentioned above. If you need support for additional queues please feel free to reach out, or contribute the queue adapter on this repository. Once the sidecar pulls the message from the queue, it needs to deliver it to the HTTP service which is written by the user. There are no constraints on the HTTP service and you can write it using your own preferred framework like FastAPI, Flask, ExpressJS, etc. The service needs to expose an endpoint where it will accept the messages in the queue as the body in the POST request. You can provide that endpoint as an input in the async service deployment spec. After the HTTP service receives the message, it processes it and returns the response to the sidecar. If we configure an output queue, the sidecar can then go and write the data back to the output queue. You can also handle the writing in your service code and not add the output queue configuration if you want.

Acking Logic with the Input queue

The sidecar acks the message to the input queue once it has received the response from the HTTP service and written the response to the output queue. If there is a failure at any of the intermediate steps, the queue will redeliver the message to one of the replicas again for processing. This ensures that there is a high level of reliability for the input messages to be processed.

Getting Started

If you’re new to the TrueFoundry Async Service, follow the comprehensive guide given below on how to deploy your service as an asynchronous service. Please make sure you have deployed a simple service on TrueFoundry before embarking on this tutorial. You will need to have the following components ready before deploying an async service.

HTTP service
Provisioned Queue

You can follow Deploy your first Async Service guide to understand the basics of deploying Async Service. You can configure the settings further using the guides below:

Dockerize the code to be deployed.
Define the resources requirements for your service - Define Resources - While the documentation is specific for services, it’s the same as what is required for Async services.
Define Queue Configuration
[Optional] Defining environment variables and secrets to be injected into the code - Environment Variables and Secrets.
Update, Rollback, Promote your Async Service: While the documentation is specifically for Services, the Update, Rollback, and Promote process follows a similar flow for Jobs.
Setting up CI/CD for your Async Service: While the documentation is specifically for Services, the CI/CD setup process follows a similar flow for Jobs.

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

LLM Tracing

Platform

Deploying On Your Own Cloud

Introduction to Async Service

Use async library

Architecture:

Use sidecar

Architecture

Acking Logic with the Input queue

Getting Started

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

LLM Tracing

Platform

Deploying On Your Own Cloud

​Use async library

​Architecture:

​We recommend using the Python Library only if your processing code is in Python and the time to process is greater than 1 minute.

​Use sidecar

​Architecture

​Acking Logic with the Input queue

​Getting Started

Use async library

Architecture:

We recommend using the Python Library only if your processing code is in Python and the time to process is greater than 1 minute.

Use sidecar

Architecture

Acking Logic with the Input queue

Getting Started