Amazon Kinesis Analytics - Demonstration
Start course

In this demo we’ll walk through the steps required to get a fully functioning Kinesis Analytics demo up and running. The demo will involve 3 main steps. Configuring an input stream, configuring a real time analytics query, and finally setting up a delivery stream to store our results into an S3 bucket.


In this demo we'll walk through the steps required to get a fully functional Kinesis demo up and running. The demo will involve three main steps. Configuring an input stream, configuring a real time analytics query, and finally setting up a delivery stream to store our results in S3 bucket. Let's get started. Within the AWS console, select the Kinesis service. Within the Kinesis console we are presented three options. Streams, Firehose, and analytics. As we're going to build a real time stream analytics demo, we chose the analytics option. From here we'll create a new Kinesis analytics application. Click the create application button. We need to specify an application name. Here we'll call ours test and we'll provide the description of what our application will provide. In this case we're going to create a stock ticker counter whereby we'll group and count how many stock trades are made for any given type of stock across a given 30 second period. Our analytics results will be streamed out into an S3 bucket. We complete the application creation by clicking the create application button. Now that our test application has been provisioned, we'll be taken through a three step process of configuring an input stream followed by coding a real time analytics query, and finally by specifying a destination output stream. Let's start by configuring our source stream. We do so by clicking the connect to a source button. We're now presented with a couple of options as to how we specify out source stream. We can either select an existing stream or create a new stream. In our case, we'll configure a new stream by selecting the configure a new stream button at the top of the current screen. Again we are presented with several options as to how to create a new source stream. We can create either a Kinesis Firehose or Kinesis stream as our source stream by choosing either of the bottom options. However, for the sake of simplicity, we'll choose to go with and create a demo stream. By doing so, AWS will create and populate a demo stream with stock ticker data. The demo stream will continuously receive test generated data. This is perfect in that we can quickly get up and running without having to put the effort into generating test data. Let's now click the create a demo stream button. AWS now begins the process of provisioning a demo stream for us. In doing so, AWS takes care of creating a Kinesis specific IAM role, a new Kinesis stream, and a method to auto-generate and populate this new Kinesis stream. This new Kinesis stream will then be set within our Kinesis analytics application as its input source. After the initial provisioning is completed, the input data stream schema will be auto-discovered and presented to us. All out, this process takes approximately 30 to 60 seconds. As you can see now, our Kinesis analytics stream has been successfully created for us. Additionally, it's been automatically selected as our chosen source stream. Scrolling down this screen we can see that the data schema for this data stream has been auto-discovered. As mentioned earlier, the demo Kinesis analytics stream is populated with stock ticker data. A sample capture of the incoming stream is displayed within a table. The table columns represent each of the detected attributes within a single record. In this case we have four columns representing the ticker symbol, sector, change, and price data attributes. We can additionally look at the raw stream sample captured by clicking the raw stream sample tab. Let's do that now. Here we can see the raw JSON data, one record per row. Next, we have the option of refining the auto discovered schema. We can do so by clicking on the edit schema button. In the edit schema view, we can see the current schema. We can change the column ordering by clicking on any of the up and down arrows. For example, let's move the ticker symbol down one row. And then revert the change by moving it back up to its original position. Additionally, we can also alter the length value. And or change the column type from its current type of VARCHAR to any of the other supported data types such as time, date, binary, or decimal as seen within the current drop down list. Let's now move on, leaving the current schema as is without any changes by clicking the exit link. We'll now scroll to the bottom of the screen and finalize the configuration of the input stream source by clicking the save and continue button. This takes us back to the Kinesis analytics test application setup view. We note that the first step in the three state process has been completed. That being the setting and enablement of our input stream. In our case, the input stream has been set to a Kinesis stream named Kinesis analytics demo stream. We now proceed to the next stage which is to set up our real time analytics query. To do this we click on the go to SQL editor button. Before we transition into the SQL editor, we're presented with an option to start our test analytics application. There are several associated benefits in doing so. These benefits include the ability to see live samples of data from the input stream, being able to get representative errors that may occur at runtime, and being able to see how the data is processed in real time by the crafted SQL we use. So with this in mind, we'll go with the default selection option of yes. We're now taken into the Kinesis analytics SQL editor. Within this editor we can create and craft our own custom SQL queries. Remembering here that any SQL code we create, once saved and enabled, will continually be applied to our incoming source data stream. Keep in mind that as we enter this screen, our application is still going through a warmup process. This usually takes approximately 30 to 90 seconds. In the meantime, let's proceed by selecting a pre-built SQL query from the template catalog. We do so by clicking the add SQL from templates button. This takes us into the SQL template catalog. Here we can choose from any of the listed pre-built SQL templates. In this demo we'll select the second template in the list, that is the aggregate function in the tumbling window template. By selecting this template, the SQL code of the template is displayed in the right side pane. Let's now add this template to our SQL editor by clicking the add this SQL to editor button. We're taken back to the SQL editor with a selected template in tow. I'll briefly provide an explanation of the two SQL statements within this template. The first create statement creates a destination stream which will be populated by the pump created in the second SQL statement. The second create statement creates a pump which pumps data into the destination stream. The pump is configured to continually select and perform a count aggregation on the source data stream, grouping by ticker symbol and using a 10 second tumbling time window. For the sake of this demo, let's customize this template by increasing the default 10 second tumbling time window to be a 30 second tumbling time window. And it's only a definition of a tumbling window. When a windowed query processes each window in a non-overlapping manner, the window is referred to as a tumbling window. In this case, each record on an in application stream belongs to a specific window and is processed only once when the query processes the window to which the record belongs. Right, we're all set to go. Let's now save and activate our real time streaming analytics SQL query by clicking the save and run SQL button. This action both saves the SQL query and execute it as per the displayed status updates. This takes a few seconds to complete. In the mean time, let's remind ourselves quickly what's happening within our query. The key ingredients within our SQL query are a count star aggregation, over a group by clause on the stock ticker symbol, over a 30 second tumbling time window. Okay, great. It looks like we're now about to receive our first lot of results. Note that the new results are added every two to 10 seconds. Let's wait a little bit longer and we should see the next set of results come through. Also note that we can enable the row view to automatically scroll to the bottom to show the most recent received results. Excellent, here's the next set of results. As you can see, we're getting a good spread of stock ticker symbols, each with a count of how many times within the 30 second tumbling time window. Let's scroll and find the Amazon stock ticker symbol. Here we can see that the count for the number of times that the Amazon stock ticker symbol showed up in this 30 second tumbling window was three. Next, let's take a look at the error stream to confirm whether we've encountered any errors. We're expecting none, and that is indeed the case. Okay, now that we've completed the setup of our real time stream querying configuration, let's now exit. We're now back in the Kinesis application window where we will proceed and finalize our demo application by configuring a destination for our analytics. To do so, we click on the connect to a destination button. We're now presented with the destination screen. Within this screen we can choose our preexisting stream or configure and create a new stream. We'll take the second option and build a new destination stream by clicking the configure a new stream option. By doing so we are presented with options to create either a new Kinesis Firehose stream or a standard Kinesis stream. As we've decided to store our analytics outputs into an S3 bucket, we'll go with the easiest option for our demo, which will be to configure an S3 Firehose stream. Therefore we click on the go to Kinesis Firehose button. Note, clicking on this button opens the Kinesis Firehose console in a separate browser tab. In this new browser tab, we'll proceed by clicking the create delivery stream button. Within the new delivery stream window, we'll start by naming our new Firehose stream. Here we're naming ours ticker counter results. Next we scroll down and ensure that the source for this new stream is set to direct put or other sources. We continue to scroll down and then click the next button to proceed to the transform records view. Here we'll leave the default settings as is. Note, if we wanted to perform any data transformations on our analytics results before they're saved into our S3 bucket, then we could do so by selecting the enabled option and then implementing a lambda function to perform the required transformation. For now, leave it as disabled and click the next button. This takes us to the select destination view. Here we can choose from three options, either S3, Redshift, or Elasticsearch. For our demo we've settled on saving our analytics results to an S3 bucket. So, we'll go with S3. Scrolling down, we now need to specify either existing S3 buckets or create a new S3 bucket. In our case, we'll create a new S3 bucket by clicking the create new button. In the S3 popup window that specify the name of our new bucket, here we'll name ours ticker counter results. Next, click on the create S3 bucket. We conclude the destination bucket setup by clicking the next button at the bottom of the screen. This takes us onto the configure settings view. Here we'll adjust the S3 buff conditions. In our case, we'll lower both the buffer size and the buffer intervals to their lowest permissible values. One megabyte and 60 seconds respectively. We do so to expedite the delivery of results into our S3 bucket given that this is a demo. In a production system we would probably not do this. Moving through the compression, encryption, and error logging settings, for the sake of simplicity, we'll choose to leave the set to their default settings. We next need to set up an IAM role which will be attached to our new Firehose delivery system. In the context of this demo, the IAM role will be required to give write access to our newly created S3 bucket. Let's go ahead and create a new IAM role by clicking the create new button. Again, take note here that this section has opened up another new browser tab. Within our demo setup, we should now have three browser tabs open. Working within the latest browser tab, we'll select the create a new IAM role dropdown option for the IAM role setting. We need to specify a role name for our new IAM role. Here we set ours to ticker counter results Firehose delivery role. Next, click on the view policy document to view the auto-generated IAM policy. Scrolling down within the policy document, we can view the permissible actions and the resources on which they're allowed. Note that we can remove the last two items in the resource section as they are necessary. Let's do this now. We complete the IAM role setup by clicking the allow button at the bottom of the screen. By doing so, it will close the current browser tab and create the IAM role in the background. Back in our configure settings view, we complete the configuration by clicking the next button. We're now taken to the final review step. Here we can review all Firehose delivery stream settings before finally creating the stream. We do so by scrolling down through the sections to the bottom where we finally click on the create delivery stream button. This takes us back to the Firehose delivery streams list view. Here we can see our newly created ticker counter results S3 delivery stream. We can now close this particular browser tab. We should now be back to our original browser tab. Clicking on the select a stream option, we are presented with a list of possible streams to pick from. We need to click the refresh icon to ensure that our S3 delivery stream that we just created shows up. We select the ticker counter results S3 delivery stream. Next, note the selected output format. It is currently defaulted to the JavaScript object notation format. Let's leave this as is and finally, we'll go with the default setting for providing the necessary IAM role and policy permissions. We complete the setup by clicking the save and continue button. As shown, this step takes approximately 30 to 90 seconds to complete. Great, we've completed all three main stages to create our test Kinesis analytics application. Before we move on, we can see that our source has been set to Kinesis analytics demo stream and that our destination has been set to ticker counter results. We should now have an active Kinesis analytics pipeline in which AWS auto-generates stock ticker data, putting it into our source stream. Our configured aggregation count query runs continually over the incoming data using a 30 second tumbling time window. The analytics results are serialized out into JSON format and I'll deliver it into our configured S3 bucket via our Firehose delivery system. Let's now open the S3 console and navigate to our ticker counter results S3 bucket. Here we can see that our bucket is currently empty. This is likely due to the buffering options that we set on our S3 Firehose delivery stream. Recalling that we set the buffer size to one megabyte and the buffer time to 60 seconds, this implies we'll need to wait until one of those buffer thresholds is breached, likely to be the buffer time. We'll now periodically click the refresh button until our first analytic results appear. Excellent, we're now in possession of our first set of results. Let's now navigate down to the file itself. We select the file itself and click on the download button. This will download a local copy of the file from which we can open up within our local default editor. Opening the file, we can see our analytics results. As per our configured SQL analytics query, we get an aggregate count for each stock symbol seen within a tumbling time window of 30 seconds serialized in JSON format.

About the Author
Learning Paths

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).