Moving Data to S3 with Apache NiFi

Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI).  In this blog post, we will examine a simple way to move data to the cloud using NiFi complete with practical steps. Calculated Systems offers a cloud-first version of NiFi that you can use to follow along.

Cloud Object Storage

There are many ways to store data on the cloud, but the easiest are the object stores. All three major cloud providers have them:
These is an ideal starting point for files as you can typically land the files without too much forethought or capacity planning. Additionally, these object stores are extremely robust, featuring multiple levels of durability and availability.
NiFi Cloud Migration
For the purposes of this tutorial, we will start with the most common object store: Amazon Simple Storage Service (Amazon S3).

Amazon S3 Terminology

Before we get started moving data, let’s establish some basic terminology:
  • Identity and Access Management (IAM) – Controls for making and controlling who and what can interact with your AWS resources. 
  • Access Keys – These are your access credentials to use AWS. These are not your typical username/password — they are generated using access identity management. 
  • Bucket – A grouping of similar files that must have a unique name. These can be made publicly accessible and are often used to host static objects.
  • Folder – Much like an operating system folder, these exist within a bucket to enable organization.

Creating an Access Key

For NiFi to have permission to write to S3, we must set it up with an access key pair. There are many ways to do this, but the best practice is to create a new IAM user. To get to the IAM user screen, navigate to the IAM homepage
  1. Select “Add user” and check “Programmatic access”
  2. Enter a new name such as “NiFi_demo”
  3. Click “Next: Permissions”
  4. Click “Create Group” and you will be presented with a list of permissions you can add to this new user
  5. Enter a group name such as “Nifi_Demo_Group”
  6. Next to filter policies search for S3 and check “AmazonS3FullAccess” > Click “Create Group”
  7. At the bottom right, select “Next:Tags” > Click through to “Next:Review”
  8. Click “Create user” to finish making an IAM User
The access key ID and secret access key are very important to setting up your data transfer. You can download them as a .CSV file or save them somewhere safe.
IMPORTANT: Be sure to record your secret access key as this is the only time it can be viewed. 

Creating an S3 Bucket

Although we will cover the basics of creating your S3 bucket in this post, you can check out Cloud Academy’s Storage Fundamentals of AWS for an in-depth overview. Now that we have credentials for AWS, we need a place to land them. To put it simply, we need to create a new S3 bucket if you do not already have one. Go to the AWS S3 Console
  1. Click “+ Create Bucket”
  2. Enter a unique bucket name and the region you are creating it in
  3. Click through until the bucket is created (default options are fine to use)
  4. Click on your new bucket and you should be able to see its contents — which will be empty

Setting up your NiFi & AWS Credential Service or Processor Controls

NiFi can be setup several ways including download from the Apache website or using a pre-made solution like Calculated System’s AWS Marketplace Offering.
NiFi has many ways to provide access to AWS either through an overarching credential service or parameters set to a specific processor. The credential service is ideal when you have multiple processors all relying on the same keys. For the scope of this tutorial, we will not be using the service, but it is ideal when moving into a production setting.
  1. To get started, click and drag in a new processor “PutS3Object” > right-click “Configure the processor”
  2. Under the Settings tab, you will see Automatically Terminate Relations > check the boxes next to “failure” and “success” since this is the last processor in the flow.
  3. Under the Properties tab, configure the following properties:
    • Access Key ID – From the User you created earlier and noted down
    • Secret Access Key ID – From the User you created earlier and noted down
    • Bucket – Put the name of the bucket you created
    • Region – The region your bucket is located; often U.S. East (N. Virginia)

    Processor Configuration

  4. Click “Apply” to finish up configuring the processor. 

Setting Up Your Flow

For the purposes of this sample flow, let’s replicate NiFi’s own configuration directory to S3. To accomplish this, we need two additional processors: ListFiles and FetchFiles. Connect and configure them as shown below.
List File Flow
ListFile
  • Properties tab – Set “Input Directory” to /nifi/docs/html
  • Drag a connection from ListFile to FetchFile for relationship success
FetchFile
  • Settings tab – Check the boxes next to “Failure,” “not.found,” * “permission.denied”
  • Drag a connection from FetchFile to PutS3Object for relationship success
Running Your Flow
  • Right-click each of the processors > click “Start”
  • Let this run for a few seconds. If you want to track the progress, right-click into any blank space of your NiFi canvas and press “refresh.” You should see each processor reporting flowfiles “in” and “out”
  • For the purpose of this demo, right-click “Stop list files.” In production, you can leave this task running, but it is always best to stop demos when done. This stops the demo from producing sample files after you stopped using the program.

Viewing the Objects in S3

If you return to your bucket, you should see your files listed. Note: You may have to refresh button the page depending on your browser/settings.
S3 Bucket

[Optional] Security Cleanup

As an optional step, you may wish to revoke the access keys you gave to this Nifi Demo. It is general best practice to remove unused keys when done. To revoke the keys, go the AWS Console.
  • Click on the user you created earlier in the tutorial
  • Go to the Security Credentials tab and search for the Access Keys subsection. Here you can inactivate, delete, or even make new keys.
  • As a best practice, make the key inactive or delete the key.
Chris Gambino

Written by

Chris Gambino

Chris has been focused on the big data ecosystem for years. Starting with simple databases he focused on Hadoop for several years before switching to a cloud-first approach. An Author of "Nifi for Dummies", his approach involves a holistic evaluation of the problem before assigning technology. https://www.calculatedsystems.com/


Related Posts

Alisha Reyes
Alisha Reyes
— July 2, 2020

New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More

This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Joe Nemer
Joe Nemer
— June 19, 2020

Kickstart Your Tech Training With a Free Week on Cloud Academy

Are you looking to make a jump in your technical career? Want to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Kubernetes, Python, or another in-demand skill?Then you'll want to mark your calendar. Starting Monday, June 22 at 12:00 a.m. PDT (3:00 a.m. EDT), ...

Read more
  • AWS
  • Azure
  • cloud academy content
  • complimentary access
  • GCP
  • on the house
Alisha Reyes
Alisha Reyes
— June 11, 2020

New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More

This month, our Content Team released tons of new content and labs in real cloud environments. Not only that, but we introduced our very first highly interactive "Office Hours" webinar. This webinar, Acing the AWS Solutions Architect Associate Certification, started with a quick overvie...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Rebecca Willis
Rebecca Willis
— June 3, 2020

Azure vs. AWS: Which Certification Provides the Brighter Future?

More and more companies are using cloud services, prompting more and more people to switch their current IT position to something cloud-related. The problem is most people only have that much time after work to learn new technologies, and there are plenty of cloud services that you can ...

Read more
  • AWS
  • Azure
  • certification
Alisha Reyes
Alisha Reyes
— June 2, 2020

Blog Digest: 5 Reasons to Get AWS Certified, OWASP Top 10, Getting Started with VPCs, Top 10 Soft Skills, and More

Thank you for being a valued member of our community! We recently sent out a short survey to understand what type of content you would like us to add to Cloud Academy, and we want to thank everyone who gave us their input. If you would like to complete the survey, it's not too late. It ...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • OWASP
  • OWASP Top 10
  • Security
  • VPCs
Alisha Reyes
Alisha Reyes
— May 11, 2020

New Content: Alibaba, Azure Cert Prep: AI-100, AZ-104, AZ-204 & AZ-400, Amazon Athena Playground, Google Cloud Developer Challenge, and much more

This month, our Content Team released 8 new learning paths, 4 courses, 7 labs in real cloud environments, and 4 new knowledge check assessments. Not only that, but we introduced our very first course on Alibaba Cloud, and our expert instructors are working 'round the clock to create 6 n...

Read more
  • alibaba
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Avatar
Rhonda Martinez
— May 4, 2020

Top 5 Reasons to Get AWS Certified Right Now

Cloud computing trends are on the rise and have been for some time already. Fortunately, it’s never too late to start learning cloud computing. Skills like AWS and others associated with cloud computing are in high demand because cloud technologies have become crucial for many businesse...

Read more
  • Amazon Elastic Book Store
  • Amazon Elastic Compute Cloud (EC2)
  • AWS
  • AWS Certifications
  • Glacier
Alisha Reyes
Alisha Reyes
— May 1, 2020

Introducing Our Newest Lab Environments: Lab Playgrounds

Want to train in a real cloud environment, but feel slowed down by spinning up your own deployments? When you consider security or pricing costs, it can be costly and challenging to get up to speed quickly for self-training. To solve this problem, Cloud Academy created a new suite of la...

Read more
  • AWS
  • Azure
  • Docker
  • Google Cloud Platform
  • Java
  • lab playgrounds
  • Python
Alisha Reyes
Alisha Reyes
— April 30, 2020

Blog Digest: AWS Breaking News, Azure DevOps, AWS Study Guide, 8 Ways to Prevent a Ransomware Attack, and More

  New articles by topicAWS Azure Data Science Google Cloud  Cloud Adoption Platform Updates & New Content Security Women in TechAWSBreaking News: All AWS Certification Exams Now Available Online As an Advanced AWS Technology Partner, C...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • programming
  • Security
Avatar
Stuart Scott
— April 27, 2020

AWS Certified Solutions Architect Associate: A Study Guide

Want to take a really impactful step in your technical career? Explore the AWS Solutions Architect Associate certificate. Its new version (SAA-C02) was released on March 23, 2020, though you can still take SAA-C01 through July 1, 2020. This post will focus on version SAA-C02.The AWS...

Read more
  • AWS
  • AWS Certifications
  • AWS Certified Solutions Architect Associate
Alisha Reyes
Alisha Reyes
— April 9, 2020

New on Cloud Academy: AWS Solutions Architect Exam Prep, Azure Courses, GCP Engineer Exam Prep, Programming, and More

Free content on Cloud Academy More and more customers are relying on our technology and content to keep upskilling their people in these months, and we are doing our best to keep supporting them. While the world fights the COVID-19 pandemic, we wanted to make a small contribution to he...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
  • programming
Joe Nemer
Joe Nemer
— April 3, 2020

Breaking News: All AWS Certification Exams Now Available Online

Remote proctoring for all AWS certifications Cloud Academy is an Advanced AWS Technology Partner, and we are happy to announce all AWS certification exams are available online!  What does this mean for you? You can stay focused on your certification goal. Or you can start a certifica...

Read more
  • AWS
  • AWS certification
  • AWS Certifications