Moving Data to S3 with Apache NiFi

Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI).  In this blog post, we will examine a simple way to move data to the cloud using NiFi complete with practical steps. Calculated Systems offers a cloud-first version of NiFi that you can use to follow along.

Cloud Object Storage

There are many ways to store data on the cloud, but the easiest are the object stores. All three major cloud providers have them:
These is an ideal starting point for files as you can typically land the files without too much forethought or capacity planning. Additionally, these object stores are extremely robust, featuring multiple levels of durability and availability.
NiFi Cloud Migration
For the purposes of this tutorial, we will start with the most common object store: Amazon Simple Storage Service (Amazon S3).

Amazon S3 Terminology

Before we get started moving data, let’s establish some basic terminology:
  • Identity and Access Management (IAM) – Controls for making and controlling who and what can interact with your AWS resources.   
  • Access Keys – These are your access credentials to use AWS. These are not your typical username/password — they are generated using access identity management. 
  • Bucket – A grouping of similar files that must have a unique name. These can be made publicly accessible and are often used to host static objects.
  • Folder – Much like an operating system folder, these exist within a bucket to enable organization.

Creating an Access Key

For NiFi to have permission to write to S3, we must set it up with an access key pair. There are many ways to do this, but the best practice is to create a new IAM user. To get to the IAM user screen, navigate to the IAM homepage
  1. Select “Add user” and check “Programmatic access”
  2. Enter a new name such as “NiFi_demo”
  3. Click “Next: Permissions”
  4. Click “Create Group” and you will be presented with a list of permissions you can add to this new user
  5. Enter a group name such as “Nifi_Demo_Group”
  6. Next to filter policies search for S3 and check “AmazonS3FullAccess” > Click “Create Group”
  7. At the bottom right, select “Next:Tags” > Click through to “Next:Review”
  8. Click “Create user” to finish making an IAM User
The access key ID and secret access key are very important to setting up your data transfer. You can download them as a .CSV file or save them somewhere safe.
IMPORTANT: Be sure to record your secret access key as this is the only time it can be viewed. 

Creating an S3 Bucket

Although we will cover the basics of creating your S3 bucket in this post, you can check out Cloud Academy’s Storage Fundamentals of AWS for an in-depth overview. Now that we have credentials for AWS, we need a place to land them. To put it simply, we need to create a new S3 bucket if you do not already have one. Go to the AWS S3 Console
  1. Click “+ Create Bucket”
  2. Enter a unique bucket name and the region you are creating it in
  3. Click through until the bucket is created (default options are fine to use)
  4. Click on your new bucket and you should be able to see its contents — which will be empty

Setting up your NiFi & AWS Credential Service or Processor Controls

NiFi can be setup several ways including download from the Apache website or using a pre-made solution like Calculated System’s AWS Marketplace Offering.
NiFi has many ways to provide access to AWS either through an overarching credential service or parameters set to a specific processor. The credential service is ideal when you have multiple processors all relying on the same keys. For the scope of this tutorial, we will not be using the service, but it is ideal when moving into a production setting.
  1. To get started, click and drag in a new processor “PutS3Object” > right-click “Configure the processor”
  2. Under the Settings tab, you will see Automatically Terminate Relations > check the boxes next to “failure” and “success” since this is the last processor in the flow.
  3. Under the Properties tab, configure the following properties:
    • Access Key ID – From the User you created earlier and noted down
    • Secret Access Key ID – From the User you created earlier and noted down
    • Bucket – Put the name of the bucket you created
    • Region – The region your bucket is located; often U.S. East (N. Virginia)

    Processor Configuration

  4. Click “Apply” to finish up configuring the processor. 

Setting Up Your Flow

For the purposes of this sample flow, let’s replicate NiFi’s own configuration directory to S3. To accomplish this, we need two additional processors: ListFiles and FetchFiles. Connect and configure them as shown below.
List File Flow
ListFile
  • Properties tab – Set “Input Directory” to /nifi/docs/html
  • Drag a connection from ListFile to FetchFile for relationship success
FetchFile
  • Settings tab – Check the boxes next to “Failure,” “not.found,” * “permission.denied”
  • Drag a connection from FetchFile to PutS3Object for relationship success
Running Your Flow
  • Right-click each of the processors > click “Start”
  • Let this run for a few seconds. If you want to track the progress, right-click into any blank space of your NiFi canvas and press “refresh.” You should see each processor reporting flowfiles “in” and “out”
  • For the purpose of this demo, right-click “Stop list files.” In production, you can leave this task running, but it is always best to stop demos when done. This stops the demo from producing sample files after you stopped using the program.

Viewing the Objects in S3

If you return to your bucket, you should see your files listed. Note: You may have to refresh button the page depending on your browser/settings.
S3 Bucket

[Optional] Security Cleanup

As an optional step, you may wish to revoke the access keys you gave to this Nifi Demo. It is general best practice to remove unused keys when done. To revoke the keys, go the AWS Console.
  • Click on the user you created earlier in the tutorial
  • Go to the Security Credentials tab and search for the Access Keys subsection. Here you can inactivate, delete, or even make new keys.
  • As a best practice, make the key inactive or delete the key.
Chris Gambino

Written by

Chris Gambino

Chris has been focused on the big data ecosystem for years. Starting with simple databases he focused on Hadoop for several years before switching to a cloud-first approach. An Author of "Nifi for Dummies", his approach involves a holistic evaluation of the problem before assigning technology. https://www.calculatedsystems.com/


Related Posts

Vijayakumar Athithan
Vijayakumar Athithan
— March 27, 2020

What is Cognito in AWS?

Web applications usually allow a valid username and password combination for successful sign in to the application. Modern authentication flows incorporate more approaches to ensure user authentication. When using AWS, this is no exception, thanks to the abilities and features offered b...

Read more
  • AWS
  • AWS Cognito
  • Solutions Architect
Connie Benton
Connie Benton
— March 25, 2020

How To Build a Career with AWS Certifications

From Iaas and PaaS solutions to digital marketing, cloud computing reshapes the world of technology. As the influence of this technology grows, so does investment. Tens of billions of dollars are being spent on cloud computing-related services each year. This influx is continuing to inc...

Read more
  • AWS
  • Certifications
Avatar
Andrew Larkin
— March 20, 2020

The 12 AWS Certifications: Which is Right for You and Your Team?

As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing. As the market leader and most ma...

Read more
  • AWS
  • AWS Certifications
Alisha Reyes
Alisha Reyes
— March 17, 2020

Cloud Academy’s Blog Digest: How Do AWS Certifications Increase Your Employability, How to Become a Microsoft Certified Azure Data Engineer, and more

With everything going on right now, it's likely that the only thing you've been reading lately is related to the coronavirus pandemic. It's important to stay informed during these times, but it's also good to jump into something that can take your mind off of the current situation for j...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • programming
  • Security
Avatar
Cloud Academy Team
— March 13, 2020

Which Certifications Should I Get?

As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— March 7, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Alisha Reyes
Alisha Reyes
— March 6, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Patrick Navarro
Patrick Navarro
— March 4, 2020

AWS Certifications: How Do They Increase Your Employability and Progress Your Career?

AWS certifications are no walk in the park. They’re designed to validate in-depth, specialist knowledge and comprehensive experience, often requiring months of dedicated studying to earn even for those already working with the cloud platform. But the rewards that AWS professionals ca...

Read more
  • AWS
  • AWS certification
  • certification
Avatar
Chandan Patra
— February 21, 2020

Elasticsearch vs. CloudSearch: AWS Cloud Search Choices

Elasticsearch vs. CloudSearch: What's the main difference? Let's compare AWS-based cloud tools: Elasticsearch vs. CloudSearch. While both services use proven technologies, Elasticsearch is more popular, open source, and has a flexible API to use for customization; in comparison, CloudS...

Read more
  • AWS
  • Azure
  • cloudsearch
  • elasticsearch
Avatar
Andrew Larkin
— February 13, 2020

Cloud Academy Content Roadmap Updates

Welcome to our Q1 2020 roadmap. This is the content we plan to build over the next three months, between February 1 - and April 30, 2020. Let's look at some of our roadmap highlights. Atlassian Bamboo for CI/CD We had a lot of requests for practical guides on how to apply DevOps tool...

Read more
  • Artificial Intelligence
  • AWS
  • Azure
  • Docker
  • Google Cloud Platform
  • Kubernetes
  • Machine Learning
Alisha Reyes
Alisha Reyes
— February 7, 2020

New on Cloud Academy: Git Labs, CKA and CKAD Lab Challenges, AWS and Azure Learning Paths, AGILE, and Much More

We just kicked off our first Free Weekend of 2020. This means we've unlocked our Training Library for just 72 hours. Until Sunday at 11:59 pm (PST), you can get unlimited access to our industry-leading learning paths, courses, certification prep exams, and our most popular hands-on labs...

Read more
  • agile
  • AWS
  • Azure
  • Google Cloud Platform
  • Linux
  • OWASP
  • programming
  • red hat
  • scrum
Avatar
Stuart Scott
— February 6, 2020

How to Encrypt an EBS Volume

Keeping data and applications safe in the cloud is one of the most visible challenges facing cloud teams in 2020. Cloud storage services where data resides are frequently a target for hackers, not because the services are inherently weak but because they are often improperly configured....

Read more
  • AWS
  • EBS
  • Encryption