Lab Steps

lock
Logging in to the Amazon Web Services Console
lock
Creating an S3 Bucket for EMR
lock
Creating an EMR Cluster
lock
Adding a Step to your running Cluster
lock
Viewing the EMR Cluster and Step Results
lock
Terminating and Cloning a Cluster
lock
Adding a new Step for a Cloned EMR Cluster to Process
live-help Need help? Contact our support team

Here you can find the instructions for this specific Lab Step.

If you are ready for a real environment experience please start the Lab. Keep in mind that you'll need to start from the first step.

Introduction

In this Lab Step you will create an S3 bucket specifically for use with Amazon EMR. An S3 bucket is a prerequisite for using EMR. Although creating a bucket is extremely easy to do, it's also a very common source of errors when first starting out with EMR. For example, some MapReduce jobs may need the output folder to already exist, others may need to create it during processing.  

Note: Another prerequisite for EMR is having an SSH key. SSH keys are only required if you plan on connecting to the EC2 instances in your cluster however. 

 

Instructions

1. In the AWS Management Console, navigate to Services > Storage > S3.

 

2. Click Create bucket. Fill out the first screen of the Create bucket wizard:

  • Bucket name: calabs-emr (A S3 bucket name must be globally unique. You will be told if your bucket name is already used. Append a "-UniqueNumber" if needed. Example:  calabs-emr-3)
  • Region: US West (Oregon)

Click Create when ready. (Settings on other screens of the wizard are not required.)

 

3. Click your calabs-emr-# bucket followed by Create folder and set the following value:

  • New folder name: output

 

4. Click Save to create the folder.

 

5. Click Create folder again and create another folder at the same level as output:

  • New folder name: logs

 

6. Click Save to create the folder.

 

Summary 

In this Lab Step you created an S3 bucket with necessary folders to store EMR processing logs and the results of a MapReduce job.  Some use cases might call for additional folders. For example, an input folder that your dataset is uploaded to.

Validation checks
1Checks
Created S3 Bucket

Check if the Amazon S3 bucket has been created

Amazon Simple Storage Service (S3)