Skip to main content

S3 FTP: Build a Reliable and Inexpensive FTP Server Using Amazon’s S3


Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?

S3 FTP server

FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have configured an FTP server and used block storage, NAS, or an SAN as your backend. However, using this kind of storage requires infrastructure support and can cost you a fair amount of time and money.

Could an S3 FTP solution work better? Since AWS’s reliable and competitively priced infrastructure is just sitting there waiting to be used, we were curious to see whether AWS can give us what we need without the administration headache.

Why S3 FTP?

Amazon S3 is reliable and accessible, that’s why.

  • Amazon S3 provides  infrastructure that’s “designed for durability of 99.999999999% of objects.”
  • Amazon S3 is built to provide “99.99% availability of objects over a given year.”
  • You pay for exactly what you need with no minimum commitments or up-front fees.
  • With Amazon S3, there’s no limit to how much data you can store or when you can access it.

NOTE: FTP is not a secure protocol and should not be used to transfer sensitive data. You might consider using the SSH File Transfer Protocol (sometimes called SFTP) for that.

Using S3 FTP: object storage as filesystem

SAN, iSCSI, and local disks are block storage devices. That means block storage volumes that are attached directly to an machine running an operating system that drives your filesystem operations. But S3 is built for object storage. This mean interactions occur at the application level via an API interface, meaning you can’t mount S3 directly within your operating system.

S3FS To the Rescue!

S3FS-Fuse will let us mount a bucket as a local filesystem with read/write access. On S3FS mounted files systems, we can simply use cp, mv, and ls – and all the basic Unix file management commands – to manage resources on locally attached disks. S3FS-Fuse is a FUSE based file system that enables fully functional filesystems in a userspace program.

GitHub S3FS Repository

So it seems that we’ve got all the pieces for an S3 FTP solution. How will it actually work?

 

S3FTP Installation and Setup

Step 1: Create an S3 Bucket

First step is to create an S3 bucket which will be the end location for our FTP uploaded files. We can do this simply by using the AWS console:

Create S3 Bucket

Step 2: Create an IAM Policy and Role for S3 Bucket Read/Write Access

Next, we create an IAM Policy and Role to control access into the previously created S3 bucket.

Later on, our EC2 instance will be launched with this role attached to grant it read and write bucket permissions. Note, its very important to take this approach with respect to granting permissions to the S3 bucket, as we want to avoid hard coding credentials within any of our scripts and/or configuration later applied to our EC2 FTP instance.

We can use the AWS CLI to perform this task:

aws iam create-policy --policy-name S3FS-Policy --policy-document file://s3fs-policy.json

The contents of the s3fs-policy.json file are:

{
   "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:ListBucket"],
            "Resource": ["arn:aws:s3:::ca-s3fs-bucket"]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": ["arn:aws:s3:::ca-s3fs-bucket/*"]
        }
    ]
}

Using the AWS IAM console, we then create the S3FS-Role and attach the S3FS-Policy like so:

Create IAM Role

Step 3: Launch FTP Server (EC2 instance – Amazon Linux)

Launch EC2 Instance

We’ll use AWS’s Amazon Linux 2 for our EC2 instance that will host our FTP service. Again using the AWS CLI we can launch an EC2 instance by running the following command – ensuring that we launch with the S3FS-Role attached.

Note: in this case we are lazily using the –associate-public-ip-address parameter to temporarily assign a public IP address for demonstration purposes. In a production environment we would provision an EIP address, and use this instead.

aws ec2 run-instances \
--image-id ami-0d1000aff9a9bad89 \
--count 1 \
--instance-type t3.micro \
--iam-instance-profile Name=S3FS-Role \
--key-name EC2-KEYNAME-HERE \
--security-group-ids SG-ID-HERE \
--subnet-id SUBNET-ID-HERE \
--associate-public-ip-address \ 
--region us-west-2 \
--tag-specifications \
'ResourceType=instance,Tags=[{Key=Name,Value=s3fs-instance}]' \
'ResourceType=volume,Tags=[{Key=Name,Value=s3fs-volume}]'

EC2 Running Instance

Step 4: Build and Install S3FS from Source:

Next we need to update the local operating system packages and install extra packages required to build and compile the s3fs binary.

sudo yum -y update
sudo yum -y install \
automake \
openssl-devel \
git \
gcc \
libstdc++-devel \
gcc-c++ \
fuse \
fuse-devel \
curl-devel \
libxml2-devel

Download the S3FS source code from GitHub, run the pre-build scripts, build and install the s3fs binary, and confirm s3fs binary is installed correctly.

git clone https://github.com/s3fs-fuse/s3fs-fuse.git
cd s3fs-fuse/

./autogen.sh
./configure

make
sudo make install

which s3fs
s3fs --help

Step 5: Configure FTP User Account and Home Directory

We create our ftpuser1 user account which we will use to authenticate against our FTP service:

sudo adduser ftpuser1
sudo passwd ftpuser1

We create the directory structure for the ftpuser1 user account which we will later configure within our FTP service, and for which will be mounted to using the s3fs binary:

sudo mkdir /home/ftpuser1/ftp
sudo chown nfsnobody:nfsnobody /home/ftpuser1/ftp
sudo chmod a-w /home/ftpuser1/ftp
sudo mkdir /home/ftpuser1/ftp/files
sudo chown ftpuser1:ftpuser1 /home/ftpuser1/ftp/files

Step 6: Install and Configure FTP Service

We now ready to install and configure our FTP service, we do so by installing the vsftpd package:

sudo yum -y install vsftpd

Take a backup of the default vsftpd.conf configuration file:

sudo cp /etc/vsftpd/vsftpd.conf /etc/vsftpd/vsftpd.conf.bak

Now use the vim editor to ensure that the following configuration properties are set and saved. Ensuring that pasv_address=X.X.X.X is updated to use the public IP address assigned to the EC2 instance:

sudo vim /etc/vsftpd/vsftpd.conf

anonymous_enable=NO
local_enable=YES
write_enable=YES
chroot_local_user=YES
user_sub_token=$USER
local_root=/home/$USER/ftp
pasv_min_port=40000
pasv_max_port=50000
pasv_address=X.X.X.X
userlist_enable=YES
userlist_file=/etc/vsftpd.userlist
userlist_deny=NO

Note: we can use the following command to remove all the default commented lines from the vsftpd.conf file, condensing it down to just the actual configuration properties that will be used at runtime:

sudo sed -i.$(date +%F) '/^#/d;/^$/d' /etc/vsftpd/vsftpd.conf

Resulting in the following specific set of configuration properties that will be used at runtime:

sudo cat /etc/vsftpd/vsftpd.conf

anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_std_format=YES
chroot_local_user=YES
listen=YES
pam_service_name=vsftpd
userlist_enable=YES
tcp_wrappers=YES
user_sub_token=$USER
local_root=/home/$USER/ftp
pasv_min_port=40000
pasv_max_port=50000
pasv_address=X.X.X.X
userlist_enable=YES
userlist_file=/etc/vsftpd.userlist
userlist_deny=NO

Additionally, keep in mind the following firewall requirements:

  • This configuration is leveraging passive ports (40000-50000) for the actual FTP data transmission. You will need to allow outbound initiated connections to both the default FTP command port (21) and the passive port range (40000-50000).
  • The FTP EC2 instance security group will need to be configured to allow inbound connections to the ports above, and where the source IP address of the inbound traffic is your external public IP address.

Since we are configuring a user list file, we need to add our ftpuser1 user account into the vsftpd.userlist file:

echo "ftpuser1" | sudo tee -a /etc/vsftpd.userlist

Finally, we are ready to startup the FTP service, we do so by running the command:

sudo systemctl restart vsftpd

Let’s check to ensure that the FTP service started up, and our vsftpd process exists:

ps -ef | grep /usr/sbin/vsftpd
root 12694 1  0 20:33 ? Ss 0:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf

Step 7: Test FTP with FTP client

Ok so we are now ready to test our FTP service – we’ll do so before we add the S3FS mount into the equation.

On a Mac we can use Brew to install the FTP command line tool:

brew install inetutils

Let’s now authenticate against our FTP service using the public IP address assigned to the EC2. In this case the public IP address we are using is: 18.236.230.74 – this will be different for you. We authenticate using the ftpuser1 user account we previously created:

ftp 18.236.230.74
Connected to 18.236.230.74.
220 (vsFTPd 3.0.2)
Name (18.236.230.74): ftpuser1
331 Please specify the password.
Password:
230 Login successful.
ftp>

We need to ensure we are in passive mode before we perform the FTP put (upload). In this case we are uploading a local file named mp3data:

ftp> passive
Passive mode on.
ftp> cd files
250 Directory successfully changed.
ftp> put mp3data
227 Entering Passive Mode (18,236,230,74,173,131).
150 Ok to send data.
226 Transfer complete.
131968 bytes sent in 0.614 seconds (210 kbytes/s)
ftp>
ftp> ls -la
227 Entering Passive Mode (18,236,230,74,181,149).
150 Here comes the directory listing.
drwxrwxrwx    1 0 0             0 Jan 01 1970 .
dr-xr-xr-x    3 65534 65534          19 Oct 25 20:17 ..
-rw-r--r--    1 1001 1001       131968 Oct 25 21:59 mp3data
226 Directory send OK.
ftp>

Lets now delete the remote file and then quit the FTP session

ftp> del mp3data
ftp> quit

Ok that looks good!

We are now ready to move on and configure the S3FS mount…

Step 8: Startup S3FS and Mount Directory

Run the following command, ensuring to use and reference the previously created S3FS-Role IAM role:

Note: the attached EC2 IAM role can be queried from within the EC2 via the metadata url:

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/

Note: if you have created your S3 bucket in a different region to Oregon (us-west-2), then ensure to update and use the appropriate setting for the url parameter.

EC2Role=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/)
S3BucketName=ca-s3fs-bucket

sudo /usr/local/bin/s3fs $S3BucketName \
-o use_cache=/tmp,iam_role="$EC2Role",allow_other /home/ftpuser1/ftp/files \
-o url="https://s3-us-west-2.amazonaws.com"

Lets now do a process check to ensure that the s3fs process has started:

ps -ef | grep  s3fs

root 12740 1  0 20:43 ? 00:00:00 /usr/local/bin/s3fs 
ca-s3fs-bucket -o use_cache=/tmp,iam_role=S3FS-Role,allow_other 
/home/ftpuser1/ftp/files -o url=https://s3-us-west-2.amazonaws.com

Looks good!!

Note: If required, the following command can be used for troubleshooting and debugging of the S3FS Fuse mounting process:

sudo /usr/local/bin/s3fs ca-s3fs-bucket \
-o use_cache=/tmp,iam_role="S3FS-Role",allow_other /home/ftpuser1/ftp/files \
-o dbglevel=info -f \
-o curldbg \
-o url="https://s3-us-west-2.amazonaws.com"

Step 9: S3 FTP End-to-End Test

In this test, we are going to use FileZilla, an FTP client. We use Site Manager to configure our connection. Note here, we explicitly set the encryption option to insecure for demonstration purposes. Do NOT do this in production if transferring sensitive files, instead setup SFTP or FTPS.

FileZilla Site Manager

With our FTP connection and credential settings in place we can go ahead and connect…

Ok, we are now ready to do an end-to-end file transfer test using FTP. In this example we FTP the mp3data file across by dragging and dropping it from the left hand side to the right hand side into the files directory – and kaboom it works!!

FileZilla FTP Application

The acid test is to now review the AWS S3 web console and confirm the presence of the mp3data file within the configured bucket, which we can clearly see here:

S3 Bucket FTP File

From now on, any files you FTP into your user directory, will automatically be uploaded and synchronized into the respective Amazon S3 bucket. How cool is that!

Summary

Voila! An S3 FTP server!

As you have just witnessed – we have successfully proven that we can leverage the S3FS-Fuse tool together with both S3 and FTP to build a file transfer solution. Let’s again review the S3 associated benefits of using this approach:

  • Amazon S3 provides  infrastructure that’s “designed for durability of 99.999999999% of objects.”
  • Amazon S3 is built to provide “99.99% availability of objects over a given year.”
  • You pay for exactly what you need, with no minimum commitments or up-front fees.
  • With Amazon S3, there’s no limit to how much data you can store or when you can access it.

If you want to deepen your understanding of how S3 works, then check out the CloudAcademy course Storage Fundamentals for AWS

CloudAcademy S3 Storage Fundamentals

Written by

Jeremy is currently employed as a Cloud Researcher and Trainer - and operates within CloudAcademy's content provider team authoring technical training documentation for both AWS and GCP cloud platforms. Jeremy has achieved AWS Certified Solutions Architect - Professional Level, and GCP Qualified Systems Operations Professional certifications.

Related Posts

— November 26, 2018

New Amazon S3 Features Announced at re:Invent

In true AWS style, a number of new features and services were announced yesterday, the day before the official start of re:Invent.Three of these announcements were related to Amazon S3 which included: S3 Intelligent Tiering (A new storage class) Batch Operations for Object M...

Read more
  • Amazon S3
  • Amazon Web Services
  • re:Invent 2018
  • S3
— September 26, 2018

How to Optimize Amazon S3 Performance

Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...

Read more
  • Amazon S3
  • AWS
— February 13, 2018

Cloud Academy Sketches: Encryption in S3

Some of 2017’s largest data breaches involved unprotected Amazon Simple Storage (S3) buckets that left millions of customer data records exposed to the public. The problem wasn’t the technology, but administrators who improperly configured the security settings.For cloud teams in char...

Read more
  • Amazon S3
  • AWS
— January 3, 2018

How to Diagnose Cancer with Amazon Machine Learning

A common question in the medical field is:Is it possible to distinguish one class of samples from another, based on some set of measurements?Research investigating this and related medical questions have spurred innovation in medicine and the application of statistical methods and m...

Read more
  • Amazon S3
  • AWS
— November 30, 2017

AWS re:Invent 2017 Day 3. Amazon Rekognition Video Enables Object and Face Recognition

From the 22 new features released by AWS today at re:invent 2017, Amazon Rekognition Video stood out to me as the interesting “quiet achiever” I want to tell you about.Amazon Rekognition Video brings object and facial recognition to live and on-demand video content. With this innovati...

Read more
  • Amazon S3
  • AWS
  • reInvent17
— August 10, 2017

Using Amazon Athena to query S3 data for CloudTrail logs

Who is Athena again? Athena is the Greek goddess of wisdom, craft, and war. (But at least she had a calm temperament, and only fought for a just cause!) This post is about Amazon Athena and about using Amazon Athena to query S3 data for CloudTrail logs, however, and I trust it will brin...

Read more
  • Amazon Athena
  • Amazon S3
  • AWS
  • CloudTrail
— April 7, 2016

A Crash Course in Amazon Serverless Architecture: Discover the Power of Amazon API Gateway, Lambda, CloudFront, and S3

New expanded content showing all three AWS Serverless posts in one article. This is a detailed look at the components of AWS Serverless Architecture and how anyone can make the most of it. Because of the complexity of the subject, this post has been subdivided into 3 sections, each with...

Read more
  • Amazon S3
  • AWS
— February 2, 2016

Amazon S3 Security: master S3 bucket polices and ACLs

Learn about Bucket Policies and ways of  implementing Access Control Lists (ACLs) to restrict/open your Amazon S3 buckets and objects to the Public and other AWS users.Follow along and learn ways of ensuring the public only access for your S3 Bucket Origin via a valid CloudFront reques...

Read more
  • Amazon S3
  • AWS
— September 11, 2015

Riak CS: a cloud storage solution compatible with Amazon S3

Riak CS is an open source cloud storage technology compatible with Amazon S3 and Openstack Swift. Discover why more and more companies are using it.Riak CS may not be the best known cloud storage technology right now, but it's definitely worthy of our attention. This post isn't meant ...

Read more
  • Amazon S3
  • AWS
— June 10, 2015

VPC Endpoint for Amazon S3: simple connectivity from AWS

Lets discuss VPC Endpoint's value, common use cases, and how to get it up and running with the AWS CLI.Last month Amazon Web Services introduced VPC Endpoint for Amazon S3. In this article I am going to explain exactly what this means, how it will change - and improve - the way AWS re...

Read more
  • Amazon S3
  • AWS
— February 17, 2015

Amazon S3 vs Amazon Glacier: A Simple Backup Strategy In The Cloud

Amazon S3 vs Amazon Glacier: which AWS storage tool should you use?When you set out to design your first AWS (Amazon Web Services) hosted application, you will need to consider the possibility of data loss.While you may have designed a highly resilient and durable solution, this w...

Read more
  • Amazon S3
  • AWS
— September 9, 2014

New lab: Create your first Amazon S3 bucket

One of the most amazing things I see here in CloudAcademy is the number of feedback we get from our members, who send lots of emails daily to tell us how good CloudAcademy.com is for them to learn Cloud, what we should improve, and what new content they would like to see soon. In fact, ...

Read more
  • Amazon S3
  • AWS