AWS Step Functions is a fully managed service for orchestrating serverless workflows as visual state machines. It coordinates AWS services like Lambda, DynamoDB, and SQS into reliable, repeatable processes without managing infrastructure.
Step Functions replaces custom orchestration code with a managed service that handles sequencing, retries, parallelism, and error handling out of the box.
Design workflows as state machines with a drag-and-drop visual editor. See the flow of your application at a glance, making complex orchestration logic easy to understand and debug.
Call Lambda functions, query DynamoDB, send SQS messages, run ECS tasks, start Glue jobs, and invoke over 200 AWS services directly without writing glue code.
Define retry policies with exponential backoff and catch blocks for error routing. Handle transient failures automatically without writing retry logic in your application code.
Use Choice states for conditional logic and Parallel states to run multiple branches simultaneously. Map states iterate over arrays, processing items concurrently at scale.
Inspect every execution with a visual timeline showing which states ran, their inputs and outputs, and where failures occurred. Built-in CloudWatch metrics and X-Ray tracing included.
Standard workflows run up to one year with exactly-once semantics. Express workflows handle high-volume, short-duration tasks at a fraction of the cost.
Step Functions operates as a state machine. You define states and transitions using the Amazon States Language, and the service executes them in order, handling retries, branching, and parallelism automatically.
Author a state machine using the Amazon States Language (JSON). Specify Task, Choice, Parallel, Map, Wait, Pass, Succeed, and Fail states with transitions between them.
Start an execution with a JSON input. Step Functions walks through each state, invoking Lambda functions or AWS services, applying retry logic, and routing errors to catch blocks.
Track every execution in the AWS console with a visual timeline. Inspect state inputs, outputs, and durations. Debug failures by seeing exactly which state failed and why.
Step Functions connects directly with over 200 AWS services. The most common integrations include:
Invoke functions for custom business logic. The most common integration for serverless workflow steps.
Read, write, and query items directly from workflow states without a Lambda intermediary.
Send messages to queues or publish notifications to topics for event-driven architectures.
Run containerized tasks and batch processing jobs as part of your workflow.
Orchestrate ETL pipelines and machine learning training and inference jobs.
Invoke child state machines to break large workflows into manageable, reusable components.
AWS offers two workflow types. Standard workflows are ideal for long-running, exactly-once processes. Express workflows are built for high-volume, short-duration tasks.
| Feature | Standard | Express |
|---|---|---|
| Max duration | Up to 1 year | Up to 5 minutes |
| Execution semantics | Exactly-once | At-least-once (async) / At-most-once (sync) |
| Pricing model | Per state transition | Per request + duration |
| Execution history | Visible in console | CloudWatch Logs only |
| Max execution history | 25,000 events | No limit (duration-bound) |
| Best for | Long-running, auditable workflows | High-volume data processing, IoT ingestion |
| Cost at 1M executions (10 steps) | ~$250 | ~$1 + duration charges |
For most serverless applications with short-running tasks, Express workflows offer dramatic cost savings. Use Standard workflows when you need exactly-once guarantees or executions that span minutes, hours, or days.
The Serverless Framework supports Step Functions through the serverless-step-functions plugin. Define your state machines directly in serverless.yml alongside your Lambda functions:
service: my-workflow
provider:
name: aws
runtime: nodejs22.x
plugins:
- serverless-step-functions
functions:
processOrder:
handler: handler.processOrder
chargePayment:
handler: handler.chargePayment
sendConfirmation:
handler: handler.sendConfirmation
stepFunctions:
stateMachines:
orderWorkflow:
name: OrderProcessingWorkflow
definition:
StartAt: ProcessOrder
States:
ProcessOrder:
Type: Task
Resource:
Fn::GetAtt: [processOrder, Arn]
Next: ChargePayment
ChargePayment:
Type: Task
Resource:
Fn::GetAtt: [chargePayment, Arn]
Retry:
- ErrorEquals: [States.TaskFailed]
IntervalSeconds: 3
MaxAttempts: 3
BackoffRate: 2
Next: SendConfirmation
SendConfirmation:
Type: Task
Resource:
Fn::GetAtt: [sendConfirmation, Arn]
End: trueThe plugin handles all CloudFormation resource creation: state machine definitions, IAM roles, Lambda permissions, and CloudWatch log groups. It also supports Express workflows, API Gateway triggers, EventBridge schedules, and nested state machines.
Coordinating ten interconnected serverless functions manually creates exponential complexity. Step Functions handles sequencing, retries, and error routing declaratively. You define what should happen, not how to manage it. The visual editor makes it easy to design, understand, and modify workflows that would otherwise require hundreds of lines of orchestration code.
Lambda functions are stateless by design. Passing data between them typically requires setting up queues, databases, or custom middleware. Step Functions provides built-in state management: each step's output automatically becomes the next step's input. You can filter, transform, and merge data between states using JSONPath expressions without any infrastructure setup.
Embedding orchestration logic inside application code couples your business logic to execution flow. Step Functions moves workflow concerns (ordering, branching, retries, timeouts) into a separate declaration. Each Lambda function focuses on one task, stays small, and remains independently testable.
Parallel states run multiple branches simultaneously, and Map states process arrays of items concurrently. Distributed Map mode can process millions of items from S3 in parallel with up to 10,000 concurrent child executions. Performance scales alongside your workload without any custom threading or queue management.
Step Functions is the right choice for most serverless orchestration, but these constraints are worth understanding upfront.
ASL is a JSON-based, proprietary language optimized for machines, not humans. The syntax is verbose, and writing complex branching or error-handling logic requires significant effort. The learning curve is steep, and the skills do not transfer outside AWS.
State machine definitions are written in a proprietary AWS format. Migrating to another cloud provider means rewriting your orchestration layer entirely. If multi-cloud portability matters, consider open standards like Temporal or Apache Airflow.
Standard workflows cap at 25,000 events per execution. Long-running workflows with many iterations can hit this ceiling. The workaround is splitting into child workflows, which adds architectural complexity.
Data passed between states cannot exceed 256 KB. For larger payloads, store data in S3 or DynamoDB and pass references. This adds latency and complexity to data-heavy workflows.
Standard workflows charge per state transition. A workflow with 20 states running 1M times per month costs $500 in Step Functions alone. For high-volume use cases, Express workflows or direct Lambda-to-Lambda patterns may be more cost-effective.
Pricing differs significantly between Standard and Express workflows. Standard charges per state transition; Express charges per request and duration.
4,000
Standard state transitions / month
Permanent
Free tier never expires (not 12-month limited)
| Service | Price |
|---|---|
| Standard state transitions | $0.025 per 1,000 transitions |
| Express requests | $1.00 per 1M requests |
| Express duration (first 1,000 hrs) | $0.0600 per GB-hour |
| Express duration (next 4,000 hrs) | $0.0400 per GB-hour |
| Express duration (over 5,000 hrs) | $0.0267 per GB-hour |
10 transitions per image x 100,000 executions = 1,000,000 transitions
Plus 10% retries: 1,100,000 transitions total
1,100 x $0.025 = $27.50/month (Standard)
Combined with Lambda compute (~$600) and data transfer (~$100), total monthly cost is approximately $727.50. Express workflows would reduce the Step Functions portion to under $2.
See the official Step Functions pricing page for current regional rates.
Use Step Functions when you need to coordinate multiple AWS services into a reliable workflow, want built-in retry and error handling, need visibility into execution progress, or are building ETL pipelines, order processing, user onboarding flows, or any multi-step process where steps depend on each other.
Consider alternatives when your workflow is a simple sequence of two or three Lambda functions (direct invocation or SQS may be simpler), you need sub-millisecond latency between steps (the state machine adds overhead), or you are processing extremely high volumes where per-transition costs become prohibitive. For simple scheduled tasks, EventBridge Scheduler with a single Lambda function is more appropriate. For complex data pipelines outside AWS, consider Apache Airflow or Temporal.
Key quotas to plan around. Most soft limits can be raised through AWS Support.
| Limit | Value |
|---|---|
| Execution history | 25,000 events max (hard limit) |
| Input/output payload | 256 KB per state |
| Maximum request size | 1 MB |
| State machines per account | 10,000 (adjustable) |
| Concurrent executions (Standard) | 1,000,000 (adjustable) |
| API rate (StartExecution, Standard) | 2,000 requests/second |
| Tags per resource | 50 |
| Execution timeout (Standard) | 1 year |
| Execution timeout (Express) | 5 minutes |
| Concurrent executions (Express) | 100,000 (adjustable) |
Step Functions is not the only way to orchestrate serverless workflows. These alternatives may be a better fit depending on your requirements.
For simple sequential tasks, invoke the next Lambda function directly from code. No orchestration layer needed. Works well for two or three steps with minimal branching.
Decouple services with message queues for high-throughput async processing. Best when you need reliable delivery without coordinating execution order across many steps.
Rule-based event routing for loosely coupled, event-driven architectures. Best when services react to events independently rather than following a prescribed sequence.
DAG-based workflow orchestration. Best for data engineering pipelines with complex dependency graphs, scheduling, and integration with non-AWS systems.
Open-source workflow engine with durable execution semantics. Best for complex, long-running business processes where you want to avoid vendor lock-in and write workflow logic in application code.
Common questions about AWS Step Functions.
Deploy a Step Functions state machine with Lambda in minutes using the Serverless Framework.