Queue-Based Load Leveling pattern in cloud-hosted applications

3 min readMay 5, 2024

In this blog post, we’ll explore the queue-based load leveling pattern, a crucial technique employed in cloud-hosted applications to optimize memory utilization.

Some angry cloud applications waiting for their turn to rain

Before delving into the topic, let’s first understand how a throttled service works. You may ask, “What is a throttled service?”

Cloud services can encounter performance and reliability issues when subjected to sudden bursts of heavy usage. It becomes challenging to predict the extent of demand when numerous tasks simultaneously utilize the same service. At times, the service gets overwhelmed with excessive requests, leading to improper responses. In worst-case scenarios, the service may even crash due to the overwhelming influx of requests.

A throttled service is a service that intentionally limits the rate at which clients can access its resources or perform operations. Throttling is typically implemented to manage the consumption of resources, prevent overload situations, ensure fair usage among all users and protect the stability and performance of the service.

Now that you understand the classic throttling pattern, it’s safe to discuss the drawbacks. When a service is throttled, clients may experience delays, timeouts, or rejection of requests if they exceed the allowed rate of access. Well, it’s not a pretty picture for the business.

Example of a throttled service hosted on Azure platform

The rate limiting pattern is typically implemented in response to a throttled service. The specific implementation of the rate-limiting pattern we’ll discuss here is called the Queue-Based Load Leveling pattern. This pattern uses a queue as a buffer between a task and the service it invokes, smoothing intermittent heavy loads that could otherwise cause the service to fail or the task to time out.
This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.

Queue-Based Load Leveling pattern

Let’s understand this concept using a sample architecture. In this architecture, users can upload files to the blob storage using the application. The job processor is responsible for storing the URLs of these uploads along with the metadata into the Azure table.

Sample implementation of Queue-Based Load Leveling on Azure cloud

This solution introduces a queue between the storage blob and the job processor. The job processor and the blob run asynchronously. The blob posts a message containing the data required by the job processor to the queue. The queue acts as a buffer, storing the message until it’s retrieved by the job processor. The job processor retrieves the messages from the queue and processes them.

Requests from the storage blob, which can be generated at a highly variable rate, are passed to the job processor through the same message queue. The queue decouples the blob from the job processor, allowing the job processor to handle messages at its own pace regardless of the volume of requests from concurrent tasks. Additionally, there’s no delay to a task if the job processor isn’t available at the time it posts a message to the queue.

We have reached the end of the blog. In this blog, we have explored a practical implementation of storage queues in cloud architectures. We have discussed how storage queues can serve as effective buffers between various components of a cloud-based system, facilitating asynchronous communication and decoupling between different services.

Thanks for reading! Do give feedback and help me make my blogs better.

Until Next Time!!

Queue-Based Load Leveling pattern in cloud-hosted applications

Queue-Based Load Leveling pattern

Written by Shubhangi Vashist