Improved SQS batch error handling with AWS Lambda

Nov 30, 2021

AWS announced partial batch response for AWS Lambda and SQS. With this new feature, processing SQS messages in batch is now much more robust, which we'll explore in detail.

We are happy to say that Serverless Framework supports that new feature immediately via the new functionResponseType option:

functions:  
  worker:
    handler: worker.handler    
    events:      
      - sqs:          
          arn: <ARN of the SQS queue>        
          batchSize: 10
          functionResponseType: ReportBatchItemFailures

Upgrade to v2.67.0 or greater to start using it. In case you are trying out Serverless Framework v3 beta, make sure to update the v3 beta with "npm -g i serverless@pre-3".

The challenge with batch processing

Up until now, it was already possible to process SQS messages in batch with AWS Lambda, but it had limitations. Here is an example:

# serverless.yml
service: my-app

provider:  
  name: aws
  
functions:  
  worker:    
    handler: worker.handler    
    events:      
      - sqs:
          arn: <ARN of the SQS queue>
          batchSize: 10

Our worker.js file would contain a handler like this:

# worker.js
exports.handler = async function(event) {  
  event.Records.forEach(record => {    
    const bodyData = JSON.parse(record.body);    
    console.log(`Processing ${record.messageId}`);    
    // ...  
  });
}

The challenge is dealing with errors: if any SQS record fails to be processed (i.e. the code handling it throws an exception), then the Lambda function execution fails and the whole batch of messages is considered failed.

That also means that if one message fails, the whole batch is retried again, including messages in the batch that were successfully processed.

The SQS + Lambda integration provided no practical way to deal with those scenarios. Indeed, catching errors that occurred when processing would mean marking the whole batch of messages as "successfully processed", which was also wrong.

AWS Lambda partial batch responses

The new "partial batch response" feature lets us signal, from AWS Lambda, which SQS messages have been successfully processed, and which have failed.

Let's fix the previous example:

functions:  
  worker:    
    handler: worker.handler    
    events:      
      - sqs:          
          arn: <ARN of the SQS queue>
          batchSize: 10
          functionResponseType: ReportBatchItemFailures

In worker.js, we can now catch errors and report failed messages in the return value of our function:

# worker.js
exports.handler = async function(event) {  
  const failedMessageIds = [];  
  
  event.Records.forEach(record => {    
    try {      
      const bodyData = JSON.parse(record.body);      
      console.log(`Processing ${record.messageId}`);      
      // ...    
    } catch (e) {      
    	failedMessageIds.push(record.messageId);    
    }  
  });
  
  return {    
    batchItemFailures: failedMessageIds.map(id => {      
      return {        
        itemIdentifier: id      
      }    
    })  
  }
}

One downside of this approach is that we can no longer monitor the "error rate" metric of AWS Lambda to monitor errors. Indeed, the Lambda function will execute successfully in all cases (because we catch errors).

But the upside is that only failed messages will be retried by SQS.

If you want to learn more, check out the official AWS documentation.

Subscribe to our newsletter to get the latest product updates, tips, and best practices!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.