One of the most applauded announcements at re:Invent 2016 was AWS Step Functions. Step Functions is basically an orchestration service for AWS Lambda and activity-based tasks. Thanks to SFN, you can control multiple executions of your processes using Lambda Functions and activity workers.
What is AWS Step Functions?
AWS Step Functions is the last application service released by AWS to solve a problem that many people reading this have probably experienced: orchestrating complex flows using Lambda Functions.
In many use cases, there are several processes composed of different tasks. If you want to run the entire process in a serverless way, you can create a Lambda Function for each task and run those functions using your own orchestrator. Writing a code that orchestrates those functions could be painful and really hard to debug and optimize. AWS Step Functions removes this need by applying an easy design and by implementing a complex flow for our functions or tasks. According to the AWS documentation page, “AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows.”
Let’s take a look at some examples.
Example 1: Game hosted on AWS
A simple use case could be handling one of your users who completes a level in your game hosted on AWS. In this case, you would need to perform many different tasks, which could include:
Updating different DynamoDB tables
Storing reports on S3
Putting a metric on CloudWatch for further analysis
Fulfilling only these three functions could be really difficult if each one is a different Lambda Function to run sequentially or in parallel (even more difficult). With AWS Step Functions, you can run those tasks in parallel, handling different kinds of exceptions for each function and handling the final results without any further complications. This is how you could implement it:
Example 2: A serverless handler for libraries
Other examples might be some flows that require human interaction. For instance, a library wants to keep track of each item loaned out to customers and it wants to help customers return their items within the deadline. In this process, a customer checks out a book, the library employee inserts that action in the system and after, a State Machine can orchestrate all of the actions necessary for bringing the book from the customer back to the library.
Thanks to AWS Step Functions, you can run a Lambda Function that sends an email to the customer that confirms the check out with a link to renew it. Another Lambda, in conjunction with Amazon API Gateway, can generate a link to mark the loan as complete. After a few days, the State Machine can send an automated message to remind the customer to renew or return the book. This example is a bit more articulated than the previous one.
As you can see, this example is more difficult than the first one. It is also more difficult to design and implement with Step Functions. The advantage of using this service is that you can implement really long tasks and also handle human interaction that can modify the flow of execution.
A deeper look at AWS Step Functions
Under the hood
So, what powers AWS Step Functions? As you can imagine, there are several affinities between this service and Amazon Simple Workflow (SWF). In fact, as Tim Bray pointed out in the video presentation at re:Invent 2016, SWF shares part of its backend with SFN, but at first glance, Step Functions is less complicated. Let’s try to understand the main components of this service so that you can start using it in your next project.
The biggest component is the State Machine. A State Machine represents the flow that you need to put in place to achieve your goals. For example, to manage lending resources for a library you need to create a State Machine that coordinates each task to provide a better experience for customers. The previous two screenshots above are examples of State Machines.
It is very easy to create a State Machine. You basically need a JSON and that’s it. Using the API or the AWS Console, you are able to create it and start as many executions as you need. The JSON template must follow the Amazon States Language. While it is not so easy to compose, with the console you have a real-time graph that shows you what to do.
Here are the JSON templates for implementing the State Machines for example number 1 and the number 2 above.
A State Machine is made of boxes, and each one represents a State. States are referred to by their name inside the State Machine template. Each name must be unique and there are many different State types. Currently, the available states (based on the publish date for this post) are:
Choice state: Branch the execution
Fail or Succeed state: Stop an execution with a failure or a success
Pass state: Pass its input to the output, injecting some prefixed data
Wait state: Provide a delay for a certain amount of time or until a specified time/date
Parallel state: Begin parallel branches of execution
Task state: Execute some code in your state machine
All of the work in your State Machine is accomplished by tasks. A task can be:
A Lambda Function: You have to specify its ARN
An Activity: A piece of code that can be hosted wherever you want. It needs to call the GetActivityTask API to start the job and SendTaskSuccess or SendTaskFailure APIs to send the result of it. In this way, you can also include human tasks in your State Machine or those that are too long to be hosted in a Lambda Function. In our library example, you need to provide to the user a link that, if clicked, either renews the resource that has been checked out, or marks as it completed. Thanks to API Gateway and the SendTaskSuccess or SendTaskFailure API, you can do it.
Good to know: Pricing
A big difference between Step Functions and Simple Workflow is their pricing. AWS Step Functions is billed for each state transition of your execution. For example, if your State Machine has three steps in series, each execution consists of four state transitions. For each account, the first 4,000 transitions per month are included in the free tier and it will last forever. The free tier is a nice thing to have, but other than state transitions you will be charged for Lambda Function executions, data transfer, and EC2 instances if your activity is hosted there. In my opinion, this service is not as inexpensive as you might expect. Using it in production with a lot of execution can incur high expenses, but in many cases is necessary and removes the pain of orchestrating different tasks. It also provides us with a lot of nice features.
How can we actually use AWS Step Functions?
After this really long but necessary introduction to AWS Step Functions, let’s move on to how to use this service. There are three ways:
Simply insert the JSON template in the Code Box and your State Machine will appear in the Preview box. AWS also provides a really small but very accurate set of blueprints to start out. For example, if you need a simple State Machine made by a parallel step, you only need to click on the related blueprint, change the name and the ARN and that’s it, your State Machine is ready for production. I think that AWS did a great job here, and the console is really helpful for composing State Machines.
CloudFormation is the Infrastructure as a Code service of AWS (Follow this link for an introduction to Code service of AWS). It supports AWS Step Functions, and it is pretty simple to implement. As you can see below, you only need to specify the State Machine template and the service role ARN.
After you have created your State Machine, you would like to run millions of executions and this is the easy part! In fact, AWS Step Functions is a fully managed service and you don’t need to take care of scaling or server maintenance. We are in a serverless world right now!
In order to take control of your system, you need to have a really good monitoring system in place. With Step Functions, AWS did a pretty good job. Using the console, each execution has its own logs for each state and they are well detailed. SFN is also integrated with CloudWatch Metrics and CloudTrail. Of course, if your activity is performed by a Lambda Function, each of them will deliver their logs to CloudWatch as usual. You can learn more about these services following these links: Introduction to CloudWatch and Learn the tools for governing accounts. Here a screenshot where you can see the logs that the AWS Console provides.
Using the console is simple and the user interface is pretty good, but what about getting the logs of execution via API? The API that we need here is the GetExecutionHistory. This will provide us with the complete history of execution. Although I have never used it before, after reading the doc, I can see that the response could be pretty hard to handle. In fact, there are a lot of different possible fields that represent each type of activity and its result. For example, in the case of failure, there is a different field if the activity type is: ActivityFailed or LambdaFunctionFailed (even if they have the same information inside).
Why Step Functions is your friend
There are several great things about this new service:
State as a service
AWS Step Functions provide something that could be called state as-a-service. Usually, a serverless infrastructure is also stateless. In fact, if you are using multiple Lambda Functions to complete a task, it is really hard to store and keep the state of an execution up to date. If you need it, you are probably going to use S3 or a database, but this is a repetitive and complex task to accomplish. SFN will keep your state among each task and orchestrate each of them to run only if needed and in the right order.
Keep your tasks alive with a Heartbeat
Another cool feature is that you are able to build really long tasks. The maximum duration for a single execution is one year!
For long tasks, you can also specify TimeoutSeconds and HeartbeatSeconds parameters. If a state runs longer than its TimeoutSeconds, then it fails with a States.Timeout Error. The latter parameter is even more powerful. In fact, by specifying the HeartbeatSeconds parameter you have to design your activity worker to call the SendTaskHeartbeat API for at least the amount of seconds that the parameter specifies. If you don’t call that API, the state fails with a States.Timeout Error. Both of these parameters could be useful, for instance, when your activity has to process a bunch of records. You can specify the timeout for the entire duration of the activity, or, using the HeartbeatSeconds parameter you can say: the activity must process an overall of M records but N records at least each X seconds. You can do it by specifying that parameter to X seconds and every N records call the API.
A really difficult problem to solve using Lambda is implementing a retry strategy. This is quite important but difficult to achieve in an easy and simple way. AWS Step Functions allows you to define a retry strategy to all different kinds of errors that your Lambda Functions can incur. I think this is easier to understand using an example.
This is the template for creating the Hello World State Machine with a retry strategy. The HelloWorld Lambda Function could fail for different reasons, but we can handle errors with a different strategy. For each kind of strategy you can define three parameters:
IntervalSeconds: Represents the number of seconds before the first retry attempt
MaxAttempts: Represents the maximum number of retry attempts
BackoffRate: The multiplier that increases the retry interval on each attempt
You can use the same strategy for multiple kinds of exceptions by defining multiple values in the ErrorEquals array. If you do not need to retry a function in the event of a specific error, you can set the Next field with the name of another State. In the example above, if a CriticalError happens, the Lambda AlertDevOps is invoked and then the execution terminates.
Cool Service Console
One thing that I would like to highlight here is the AWS Console. Usually, AWS doesn’t provide a nice console to interact with and that’s ok because after a bit of experience with the service a developer usually switches to use the service via API, CLI, SDK, or even CloudFormation.
With Step Functions, AWS creates a user-friendly experience by providing a lot of blueprints but also a nice UI with all the information needed. In fact, on the creation phase of a State Machine, you can start from different kinds of blueprints and you have two boxes. The Preview box represents the State Machine that you are building and is based on the Code box, positioned below it. Once you are satisfied with your State Machine, you can create it and start simulating all the executions that you need. Here the AWS Console is also helpful. For each execution, you have all of the logs that you need, both at the execution and single state level.
What I’d like to see in the future
I used this service a couple of times and there are several features that I feel are missing:
The first point is the price. With complex task, AWS Step Functions is mandatory but at a high cost. I hope that a price reduction or, even better, a pricing model similar to CloudFormation, ElasticBeanstalk, or ECS, is forthcoming. Those services offer a “pay only for what you use model.” Instead, with Step Functions, you pay for the resources that you use (Lambda, EC2, on-premise servers), but also for states that only wait or just pass to the next one.
Where are triggers?
With Lambda, AWS teaches us to love events and triggers. Where are they? It would be nice to start an execution in response to an event such as AmazonDynamoDB or AmazonKinesis streams, AWS Code Commit push, or automatically pull messages from an Amazon SQS Queue. Now, you are able to integrate Step Functions with Amazon API Gateway, which makes human tasks possible in our executions. Yesterday AWS announced the integration between CloudWatch Events and Step Functions. That is good news because means that AWS is working on integrating more triggers with this service.
What about push events to other services?
Even the opposite feature of triggers is missing. Here, I’m talking about the ability to automatically send the event received from the previous state in my State Machine to other AWS Services. For example, I’d like to receive a notification for each execution that ends without error. The last state of my State Machine could be an integration with Amazon SNS that without any further code will trigger alerts.
There are several key points to keep in mind with this service:
State as a service in a serverless infrastructure
Easy integration of human tasks
Really long execution with timeout and heartbeat functionality
Deep integrations with CloudWatch Logs, Metrics, and CloudTrail
Nice AWS Console with blueprint and everything you need to get started
Giacomo is a Computer Engineer with a passion for all things AWS and Cloud Technology. Treating every day like a school day, Giacomo is curious about everything. Constantly traveling, discovering and experiencing new things from sports to photography; he revels in uncovering the unknown.
Being able to architect your own isolated segment of AWS is a simple process using VPCs; understanding how to architect its related networking components and connectivity architecture is key to making it a powerful service.Many services within Amazon Web Services (AWS) require you t...
AWS is renowned for the rate at which it reinvents, revolutionizes, and meets customer demands and expectations through its continuous cycle of feature and service updates. With hundreds of updates a month, it can be difficult to stay on top of all the changes made available. Here ...
Amazon Web Services (AWS) offers three different ways to pay for EC2 Instances: On-Demand, Reserved Instances, and Spot Instances. This article will focus on effective strategies for purchasing Reserved Instances. While most of the major cloud platforms offer pre-pay and reservation dis...
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 11 certifications that cover major cloud roles including Solutions Architect, De...
The AWS Solutions Architect - Associate Certification (or Sol Arch Associate for short) offers some clear benefits: Increases marketability to employers Provides solid credentials in a growing industry (with projected growth of as much as 70 percent in five years) Market anal...
Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI). In this blog post, we will examine a simple way to move data to the cloud using NiFi c...
Amazon DynamoDB is a managed NoSQL service with strong consistency and predictable performance that shields users from the complexities of manual setup.Whether or not you've actually used a NoSQL data store yourself, it's probably a good idea to make sure you fully understand the key ...
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing.As the market leader and most ma...
Learn how Aviatrix’s intelligent orchestration and control eliminates unwanted tradeoffs encountered when deploying Palo Alto Networks VM-Series Firewalls with AWS Transit Gateway.Deploying any next generation firewall in a public cloud environment is challenging, not because of the f...
Use AWS Config the Right Way for Successful ComplianceIt’s well-known that AWS Config is a powerful service for monitoring all changes across your resources. As AWS Config has constantly evolved and improved over the years, it has transformed into a true powerhouse for monitoring your...
Cloud Academy is a proud sponsor of the 2019 AWS Summits in Atlanta, London, and Chicago. We hope you plan to attend these free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. These events are all about learning. You can learn how t...
The AWS cloud platform has made it easier than ever to be flexible, efficient, and cost-effective. However, monitoring your AWS infrastructure is the key to getting all of these benefits. Realizing these benefits requires that you follow AWS best practices which constantly change as AWS...