One of the most applauded announcements at re:Invent 2016 was AWS Step Functions. Step Functions is basically an orchestration service for AWS Lambda and activity-based tasks. Thanks to SFN, you can control multiple executions of your processes using Lambda Functions and activity workers.
What is AWS Step Functions?
AWS Step Functions is the last application service released by AWS to solve a problem that many people reading this have probably experienced: orchestrating complex flows using Lambda Functions.
In many use cases, there are several processes composed of different tasks. If you want to run the entire process in a serverless way, you can create a Lambda Function for each task and run those functions using your own orchestrator. Writing a code that orchestrates those functions could be painful and really hard to debug and optimize. AWS Step Functions removes this need by applying an easy design and by implementing a complex flow for our functions or tasks. According to the AWS documentation page, “AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows.”
Let’s take a look at some examples.
Example 1: Game hosted on AWS
A simple use case could be handling one of your users who completes a level in your game hosted on AWS. In this case, you would need to perform many different tasks, which could include:
Updating different DynamoDB tables
Storing reports on S3
Putting a metric on CloudWatch for further analysis
Fulfilling only these three functions could be really difficult if each one is a different Lambda Function to run sequentially or in parallel (even more difficult). With AWS Step Functions, you can run those tasks in parallel, handling different kinds of exceptions for each function and handling the final results without any further complications. This is how you could implement it:
Example 2: A serverless handler for libraries
Other examples might be some flows that require human interaction. For instance, a library wants to keep track of each item loaned out to customers and it wants to help customers return their items within the deadline. In this process, a customer checks out a book, the library employee inserts that action in the system and after, a State Machine can orchestrate all of the actions necessary for bringing the book from the customer back to the library.
Thanks to AWS Step Functions, you can run a Lambda Function that sends an email to the customer that confirms the check out with a link to renew it. Another Lambda, in conjunction with Amazon API Gateway, can generate a link to mark the loan as complete. After a few days, the State Machine can send an automated message to remind the customer to renew or return the book. This example is a bit more articulated than the previous one.
As you can see, this example is more difficult than the first one. It is also more difficult to design and implement with Step Functions. The advantage of using this service is that you can implement really long tasks and also handle human interaction that can modify the flow of execution.
A deeper look at AWS Step Functions
Under the hood
So, what powers AWS Step Functions? As you can imagine, there are several affinities between this service and Amazon Simple Workflow (SWF). In fact, as Tim Bray pointed out in the video presentation at re:Invent 2016, SWF shares part of its backend with SFN, but at first glance, Step Functions is less complicated. Let’s try to understand the main components of this service so that you can start using it in your next project.
The biggest component is the State Machine. A State Machine represents the flow that you need to put in place to achieve your goals. For example, to manage lending resources for a library you need to create a State Machine that coordinates each task to provide a better experience for customers. The previous two screenshots above are examples of State Machines.
It is very easy to create a State Machine. You basically need a JSON and that’s it. Using the API or the AWS Console, you are able to create it and start as many executions as you need. The JSON template must follow the Amazon States Language. While it is not so easy to compose, with the console you have a real-time graph that shows you what to do.
Here are the JSON templates for implementing the State Machines for example number 1 and the number 2 above.
A State Machine is made of boxes, and each one represents a State. States are referred to by their name inside the State Machine template. Each name must be unique and there are many different State types. Currently, the available states (based on the publish date for this post) are:
Choice state: Branch the execution
Fail or Succeed state: Stop an execution with a failure or a success
Pass state: Pass its input to the output, injecting some prefixed data
Wait state: Provide a delay for a certain amount of time or until a specified time/date
Parallel state: Begin parallel branches of execution
Task state: Execute some code in your state machine
All of the work in your State Machine is accomplished by tasks. A task can be:
A Lambda Function: You have to specify its ARN
An Activity: A piece of code that can be hosted wherever you want. It needs to call the GetActivityTask API to start the job and SendTaskSuccess or SendTaskFailure APIs to send the result of it. In this way, you can also include human tasks in your State Machine or those that are too long to be hosted in a Lambda Function. In our library example, you need to provide to the user a link that, if clicked, either renews the resource that has been checked out, or marks as it completed. Thanks to API Gateway and the SendTaskSuccess or SendTaskFailure API, you can do it.
Good to know: Pricing
A big difference between Step Functions and Simple Workflow is their pricing. AWS Step Functions is billed for each state transition of your execution. For example, if your State Machine has three steps in series, each execution consists of four state transitions. For each account, the first 4,000 transitions per month are included in the free tier and it will last forever. The free tier is a nice thing to have, but other than state transitions you will be charged for Lambda Function executions, data transfer, and EC2 instances if your activity is hosted there. In my opinion, this service is not as inexpensive as you might expect. Using it in production with a lot of execution can incur high expenses, but in many cases is necessary and removes the pain of orchestrating different tasks. It also provides us with a lot of nice features.
How can we actually use AWS Step Functions?
After this really long but necessary introduction to AWS Step Functions, let’s move on to how to use this service. There are three ways:
Simply insert the JSON template in the Code Box and your State Machine will appear in the Preview box. AWS also provides a really small but very accurate set of blueprints to start out. For example, if you need a simple State Machine made by a parallel step, you only need to click on the related blueprint, change the name and the ARN and that’s it, your State Machine is ready for production. I think that AWS did a great job here, and the console is really helpful for composing State Machines.
CloudFormation is the Infrastructure as a Code service of AWS (Follow this link for an introduction to Code service of AWS). It supports AWS Step Functions, and it is pretty simple to implement. As you can see below, you only need to specify the State Machine template and the service role ARN.
After you have created your State Machine, you would like to run millions of executions and this is the easy part! In fact, AWS Step Functions is a fully managed service and you don’t need to take care of scaling or server maintenance. We are in a serverless world right now!
In order to take control of your system, you need to have a really good monitoring system in place. With Step Functions, AWS did a pretty good job. Using the console, each execution has its own logs for each state and they are well detailed. SFN is also integrated with CloudWatch Metrics and CloudTrail. Of course, if your activity is performed by a Lambda Function, each of them will deliver their logs to CloudWatch as usual. You can learn more about these services following these links: Introduction to CloudWatch and Learn the tools for governing accounts. Here a screenshot where you can see the logs that the AWS Console provides.
Using the console is simple and the user interface is pretty good, but what about getting the logs of execution via API? The API that we need here is the GetExecutionHistory. This will provide us with the complete history of execution. Although I have never used it before, after reading the doc, I can see that the response could be pretty hard to handle. In fact, there are a lot of different possible fields that represent each type of activity and its result. For example, in the case of failure, there is a different field if the activity type is: ActivityFailed or LambdaFunctionFailed (even if they have the same information inside).
Why Step Functions is your friend
There are several great things about this new service:
State as a service
AWS Step Functions provide something that could be called state as-a-service. Usually, a serverless infrastructure is also stateless. In fact, if you are using multiple Lambda Functions to complete a task, it is really hard to store and keep the state of an execution up to date. If you need it, you are probably going to use S3 or a database, but this is a repetitive and complex task to accomplish. SFN will keep your state among each task and orchestrate each of them to run only if needed and in the right order.
Keep your tasks alive with a Heartbeat
Another cool feature is that you are able to build really long tasks. The maximum duration for a single execution is one year!
For long tasks, you can also specify TimeoutSeconds and HeartbeatSeconds parameters. If a state runs longer than its TimeoutSeconds, then it fails with a States.Timeout Error. The latter parameter is even more powerful. In fact, by specifying the HeartbeatSeconds parameter you have to design your activity worker to call the SendTaskHeartbeat API for at least the amount of seconds that the parameter specifies. If you don’t call that API, the state fails with a States.Timeout Error. Both of these parameters could be useful, for instance, when your activity has to process a bunch of records. You can specify the timeout for the entire duration of the activity, or, using the HeartbeatSeconds parameter you can say: the activity must process an overall of M records but N records at least each X seconds. You can do it by specifying that parameter to X seconds and every N records call the API.
A really difficult problem to solve using Lambda is implementing a retry strategy. This is quite important but difficult to achieve in an easy and simple way. AWS Step Functions allows you to define a retry strategy to all different kinds of errors that your Lambda Functions can incur. I think this is easier to understand using an example.
This is the template for creating the Hello World State Machine with a retry strategy. The HelloWorld Lambda Function could fail for different reasons, but we can handle errors with a different strategy. For each kind of strategy you can define three parameters:
IntervalSeconds: Represents the number of seconds before the first retry attempt
MaxAttempts: Represents the maximum number of retry attempts
BackoffRate: The multiplier that increases the retry interval on each attempt
You can use the same strategy for multiple kinds of exceptions by defining multiple values in the ErrorEquals array. If you do not need to retry a function in the event of a specific error, you can set the Next field with the name of another State. In the example above, if a CriticalError happens, the Lambda AlertDevOps is invoked and then the execution terminates.
Cool Service Console
One thing that I would like to highlight here is the AWS Console. Usually, AWS doesn’t provide a nice console to interact with and that’s ok because after a bit of experience with the service a developer usually switches to use the service via API, CLI, SDK, or even CloudFormation.
With Step Functions, AWS creates a user-friendly experience by providing a lot of blueprints but also a nice UI with all the information needed. In fact, on the creation phase of a State Machine, you can start from different kinds of blueprints and you have two boxes. The Preview box represents the State Machine that you are building and is based on the Code box, positioned below it. Once you are satisfied with your State Machine, you can create it and start simulating all the executions that you need. Here the AWS Console is also helpful. For each execution, you have all of the logs that you need, both at the execution and single state level.
What I’d like to see in the future
I used this service a couple of times and there are several features that I feel are missing:
The first point is the price. With complex task, AWS Step Functions is mandatory but at a high cost. I hope that a price reduction or, even better, a pricing model similar to CloudFormation, ElasticBeanstalk, or ECS, is forthcoming. Those services offer a “pay only for what you use model.” Instead, with Step Functions, you pay for the resources that you use (Lambda, EC2, on-premise servers), but also for states that only wait or just pass to the next one.
Where are triggers?
With Lambda, AWS teaches us to love events and triggers. Where are they? It would be nice to start an execution in response to an event such as AmazonDynamoDB or AmazonKinesis streams, AWS Code Commit push, or automatically pull messages from an Amazon SQS Queue. Now, you are able to integrate Step Functions with Amazon API Gateway, which makes human tasks possible in our executions. Yesterday AWS announced the integration between CloudWatch Events and Step Functions. That is good news because means that AWS is working on integrating more triggers with this service.
What about push events to other services?
Even the opposite feature of triggers is missing. Here, I’m talking about the ability to automatically send the event received from the previous state in my State Machine to other AWS Services. For example, I’d like to receive a notification for each execution that ends without error. The last state of my State Machine could be an integration with Amazon SNS that without any further code will trigger alerts.
There are several key points to keep in mind with this service:
State as a service in a serverless infrastructure
Easy integration of human tasks
Really long execution with timeout and heartbeat functionality
Deep integrations with CloudWatch Logs, Metrics, and CloudTrail
Nice AWS Console with blueprint and everything you need to get started
Giacomo is a Computer Engineer with a passion for all things AWS and Cloud Technology. Treating every day like a school day, Giacomo is curious about everything. Constantly traveling, discovering and experiencing new things from sports to photography; he revels in uncovering the unknown.
Amazon Web Services’ resource offerings are constantly changing, and staying on top of their evolution can be a challenge. Elastic Cloud Compute (EC2) instances are one of their core resource offerings, and they form the backbone of most cloud deployments. EC2 instances provide you with...
AWS's WaitCondition can be used with CloudFormation templates to ensure required resources are running.As you may already be aware, AWS CloudFormation is used for infrastructure automation by allowing you to write JSON templates to automatically install, configure, and bootstrap your ...
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing.As the market leader and most ma...
The announcements at re:Invent just keep on coming! Let’s look at what benefits these two new EC2 instance types offer and how these two new instances could be of benefit to you. If you're not too familiar with Amazon EC2, you might want to familiarize yourself by creating your first Am...
Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2018, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the first time. In t...
In order to understand AWS VPC egress filtering methods, you first need to understand that security on AWS is governed by a shared responsibility model where both vendor and subscriber have various operational responsibilities. AWS assumes responsibility for the underlying infrastructur...
Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have conf...
Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...
There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...
Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...
One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...
A Comparison of Machine Learning Services on AWS, Azure, and Google CloudArtificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to beli...