When we update Lambda functions in serverless applications, we take a lot of precautions.
We build tests to be confident that we are not introducing bugs or breaking anything. We publish the update in different stages to check how it behaves in the cloud.
And still, a tingling runs down our spines every time we release to production. We are never 100% sure that we won't bump into an integration error, or that we didn't overlook any edge cases.
If that sounds like you, then fear no more! The Canary Deployments Plugin is your safety net.
I built this plugin for the Serverless Framework to let you manage canary deployments with ease. Below, I'll dive into how the plugin works and show you an example of the Canary Deployments Plugin in action.
Ready? Awesome.
Background: Lambda Weighted Aliases + CodeDeploy = Peace of mind
The AWS team recently introduced traffic shifting for Lambda function aliases, which basically means that we can now split the traffic of our functions between two different versions. We just have to specify the percentage of incoming traffic we want to direct to the new release, and Lambda will automatically load balance requests between versions when we invoke the alias.
So, instead of completely replacing our function for another version, we could take a more conservative approach making the new code coexist with the old stable one and checking how it performs.
All this sounds great, but let's be honest, changing aliases weights and checking how new functions behave by ourselves is not going to make our lives easier. Fortunately, AWS has the service we need: CodeDeploy.
CodeDeploy is capable of automatically updating our functions' aliases weights according to our preferences. Even better, it will also roll back automatically if it notices that something goes wrong.
Basically, our Lambda function deployments will be on autopilot.
Canary deployments: before and after
Now let's say we wanted to implement all this Weighted Alias + CodeDeploy stuff into our serverless application. To do this, we'd need to create some AWS resources.
We'd need to first create a CodeDeploy application, a DeploymentGroup and an Alias for each function, plus some new permissions here and there, then replace all the event sources to trigger the aliases instead of $Latest
...—yeah. Not actually that straightforward.
But that was then, and this is now.
Now, the Canary Deployments Plugin takes care of all those things for you! It makes gradual deployments simply a matter of configuration.
Integrating canary deployments into your serverless app
In the next sections, we'll walk through the different options we can configure with the Canary Deployments Plugin.
Setting up the environment
We'll create a simple Serverless service with one function exposed through API Gateway, making sure that we install the plugin.
Our serverless.yml
should look like this:
service: sls-canary-example
provider:
name: aws
runtime: nodejs6.10
plugins:
- serverless-plugin-canary-deployments
functions:
hello:
handler: handler.hello
events:
- http: get hello
Our function will simply return a message:
module.exports.hello = (event, context, callback) => {
const response = {
statusCode: 200,
body: 'Go Serverless v1.0! Your function executed successfully!'
};
callback(null, response);
};
Which we should get by calling the endpoint after deploying our service:
$ curl https://your-api-gateway-endpoint.com/dev/hello
Go Serverless v1.0! Your function executed successfully!
Deploying the function gradually
This is where the fun starts. We'll tell the plugin to split the traffic between the last two versions of our function during the next deploy, and gradually shift more traffic to the new one until it receives all the load.
There are three different fashions of gradual deployments:
- Canary: a specific amount of traffic is shifted to the new version during a certain period of time. When that time elapses, all the traffic goes to the new version.
- Linear: traffic is shifted to the new version incrementally in intervals until it gets all of it.
- All-at-once: the least gradual of all of them, all the traffic is shifted to the new version straight away.
All we need now is to specify the parameters and type of deployment, by choosing any of the CodeDeploy's deployment preference presets: Canary10Percent30Minutes
, Canary10Percent5Minutes
, Canary10Percent10Minutes
, Canary10Percent15Minutes
, Linear10PercentEvery10Minutes
, Linear10PercentEvery1Minute
, Linear10PercentEvery2Minutes
, Linear10PercentEvery3Minutes
or AllAtOnce
.
We'll pick Linear10PercentEvery1Minute
, so that the new version of our function gets is incremented by 10% every minute, until it reaches 100%. For that, we only have to set type
and alias
(the name of the alias we want to create) under deploymentSettings
in our function:
service: sls-canary-example
provider:
name: aws
runtime: nodejs6.10
plugins:
- serverless-plugin-canary-deployments
functions:
hello:
handler: handler.hello
events:
- http: get hello
deploymentSettings:
type: Linear10PercentEvery1Minute
alias: Live
If we now update and deploy the function:
module.exports.hello = (event, context, callback) => {
const response = {
statusCode: 200,
body: 'Wait, it is Serverless v1.26.11!'
};
callback(null, response);
};
We'll notice that the requests are being load balanced between the two latest versions:
$ curl https://your-api-gateway-endpoint.com/dev/hello
Go Serverless v1.0! Your function executed successfully!
$ curl https://your-api-gateway-endpoint.com/dev/hello
Wait, it is Serverless v1.26.11!
Making sure you don't break anything
Now our function is not deployed in a single flip, woohoo!
But, if we have to ensure that the whole system is behaving correctly by ourselves, we didn't really achieve anything impressive, did we? We can add another AWS service to the mix to avoid this: CloudWatch Alarms
Note: read more on this in our CloudWatch alarms post.
We can provide CodeDeploy with a list of alarms to track during the deployment process, then cancel it and shift all the traffic to the old version if any of them turns into the ALARM
state.
In this example, we'll monitor that our function doesn't have any invocation error. We'll be using A Cloud Guru's Alerts Plugin for this:
service: sls-canary-example
provider:
name: aws
runtime: nodejs6.10
plugins:
- serverless-plugin-aws-alerts
- serverless-plugin-canary-deployments
custom:
alerts:
dashboards: true
functions:
hello:
handler: handler.hello
events:
- http: get hello
alarms:
- name: foo
namespace: 'AWS/Lambda'
metric: Errors
threshold: 1
statistic: Minimum
period: 60
evaluationPeriods: 1
comparisonOperator: GreaterThanOrEqualToThreshold
deploymentSettings:
type: Linear10PercentEvery1Minute
alias: Live
alarms:
- HelloFooAlarm
The Canary Deployments plugin expects the logical ID of the CloudWatch alarms, which the Alerts plugin builds by concatenating the function name, the alarm name and the string "Alarm" in Pascal case.
End-to-end testing
By gradually deploying our function and being able to track CloudWatch metrics, we have all the tools we need to minimize the impact of a potential bug in our code. However, we could even avoid invoking a function version with errors by running CodeDeploy Hooks first.
Hooks are simply Lambda functions triggered by CodeDeploy before and after traffic shifting takes place. It expects to get notified about the success or failure of the hook, only continuing to the next step if it succeeded. They are perfect for running end-to-end or integration tests and checking that all the pieces fit together in the cloud, since it'll automatically roll back upon failure.
This is how we can configure hooks:
service: sls-canary-example
provider:
name: aws
runtime: nodejs6.10
iamRoleStatements:
- Effect: Allow
Action:
- codedeploy:*
Resource:
- "*"
plugins:
- serverless-plugin-aws-alerts
- serverless-plugin-canary-deployments
custom:
alerts:
dashboards: true
functions:
hello:
handler: handler.hello
events:
- http: get hello
alarms:
- name: foo
namespace: 'AWS/Lambda'
metric: Errors
threshold: 1
statistic: Minimum
period: 60
evaluationPeriods: 1
comparisonOperator: GreaterThanOrEqualToThreshold
deploymentSettings:
type: Linear10PercentEvery1Minute
alias: Live
preTrafficHook: preHook
postTrafficHook: postHook
alarms:
- HelloFooAlarm
preHook:
handler: hooks.pre
postHook:
handler: hooks.post
Notice that we need to grant our functions access to CodeDeploy, so that we can use its SDK in the hooks.
It should look something like this:
const aws = require('aws-sdk');
const codedeploy = new aws.CodeDeploy({apiVersion: '2014-10-06'});
module.exports.pre = (event, context, callback) => {
var deploymentId = event.DeploymentId;
var lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;
console.log('We are running some integration tests before we start shifting traffic...');
var params = {
deploymentId: deploymentId,
lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
status: 'Succeeded' // status can be 'Succeeded' or 'Failed'
};
return codedeploy.putLifecycleEventHookExecutionStatus(params).promise()
.then(data => callback(null, 'Validation test succeeded'))
.catch(err => callback('Validation test failed'));
};
module.exports.post = (event, context, callback) => {
var deploymentId = event.DeploymentId;
var lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;
console.log('Check some stuff after traffic has been shifted...');
var params = {
deploymentId: deploymentId,
lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
status: 'Succeeded' // status can be 'Succeeded' or 'Failed'
};
return codedeploy.putLifecycleEventHookExecutionStatus(params).promise()
.then(data => callback(null, 'Validation test succeeded'))
.catch(err => callback('Validation test failed'));
};
Hooks are well-suited for running tests, but we could actually execute whatever else we need to happen before or after our function deployment too. We only have to keep in mind that we must notify CodeDeploy about the result of the hook; otherwise, it'll assume that it failed if it doesn't get any response within one hour.
Protip: we don't really need to notify CodeDeploy in the Lambda function hook. We could trigger a background job that runs anywhere else and report the result from there.
Conclusion
CodeDeploy and Lambda Weighted Aliases take the deployment process of our Serverless functions to the next level.
They significantly reduce the chances of releasing buggy code, and react automatically if anything goes wrong. Instead of publishing a new function version that gets all the invokations straight away, the deployment goes through different stages:
- The before traffic hook is executed.
- Traffic is shifted gradually to the new version and the provided CloudWatch alarms are monitored, canceling and rolling back if any of them is triggered.
- The after hook is executed.
All those stages are executed in order. If any of them fails, the deployment will be considered as failed and the whole system will roll back to the previous state.
You can get the code in the above example here, and if you're interested in the details of the underliying technology, check out this post.
Happy safe deployments!