Introduction to Async Service

Async Service in Truefoundry allows us to put input requests in a queue, process them asynchronously and then put the ouputs in another queue. You can use an async service if one or more of the following scenarios applies to your usecase:

  1. Large payload size: If the input to your service / model is really large, it might be useful to put the payload in S3 or some blob storage and then trigger the async service to process the payload. The async service can then download the S3 object and do the processing.
  2. Large processing time: If the time takes to respond to a request is high(from seconds to minutes), it almost becomes essential to use an async service since HTTP requests in a normal service can start timing out.
  3. Scale to 0: You can scale an async service to 0 if there are no items in the queue.
  4. High reliablity in case of traffic surges: If you are using a simple HTTP service and traffic suddenly rises, your service will start throwing 5XX errors in case autoscaling is slow. Async service offers a higher reliability since the messages will be stored in the queue and processed when the service scales up.

Async Service Architecture

We need to provision an input queue where we will be writing the messages. Truefoundry supports the following queues for input queues. Each of the queue points to a page on how you can provision the queue.

  1. AWS SQS
  2. Nats queue
  3. Kafka

Once the input queue is provisioned, we need to push the input messages to the queue. The tfy-async-sidecar component consumes items from the queue and calls your HTTP service with the payload as a POST request. The tfy-async sidecar code is open source and available here. It has adapters to consume the messages from the different types of queue as mentioned above.

Once the sidecar pulls the message from the queue, it needs to deliver it to the HTTP service which is written by the user. There are no constraints on the HTTP service and you can write it using your own preferred framework like FastAPI, Flask, ExpressJS, etc. The service needs to expose an endpoint where it will accepts the messages in the queue as the body in the POST request. You can provide that endpoint as an input in the async service deployment spec.

After the HTTP service receives the message, it processes it and returns the response to the sidecar. If we configure an output queue, the sidecar can then go and write the data back to the output queue. You can also handle the writing in your service code and not add the output queue configuration if you want.

Acking Logic with the Input queue

The sidecar acks the message to the input queue once it has received the response from the HTTP service and written the response to the output queue. If there is a failure at any of the intermediate steps, the queue will redeliver the message to one of the replicas again for processing. This ensures that there is a high level of reliability for the input messages to be processed.

Getting Started

If you're new to the Truefoundry Async Service, follow the comprehensive guide given below on how to deploy your service as an asynchronous service. Please make sure you have deployed a simple service on Truefoundry before embarking on this tutorial. You will need to have the following components ready before deploying an async service.

  1. HTTP service
  2. Provisioned Queue