Creating an S3 Bucket for EMR
In this Lab Step you will create an S3 bucket specifically for use with Amazon EMR. An S3 bucket is a prerequisite for using EMR. Although creating a bucket is extremely easy to do, it's also a very common source of errors when first starting out with EMR. For example, some MapReduce jobs may need the output folder to already exist, others may need to create it during processing.
Note: Another prerequisite for EMR is having an SSH key. SSH keys are only required if you plan on connecting to the EC2 instances in your cluster however.
1. In the AWS Management Console, navigate to Services > Storage > S3.
2. Click Create bucket. Fill out the first screen of the Create bucket wizard:
- Bucket name: calabs-emr (A S3 bucket name must be globally unique. You will be told if your bucket name is already used. Append a "-UniqueNumber" if needed. Example: calabs-emr-3)
- Region: US West (Oregon)
Click Create when ready. (Settings on other screens of the wizard are not required.)
3. Click your calabs-emr-# bucket followed by Create folder and set the following value:
- New folder name: output
4. Click Save to create the folder.
5. Click Create folder again and create another folder at the same level as output:
- New folder name: logs
6. Click Save to create the folder.
In this Lab Step you created an S3 bucket with necessary folders to store EMR processing logs and the results of a MapReduce job. Some use cases might call for additional folders. For example, an input folder that your dataset is uploaded to.
Check if the Amazon S3 bucket has been created