Managing Your Data Archive in the AWS Cloud

Since the need for a reliable data archive is well known, there’s no need for us to focus on that. Instead, we’ll discuss the various data archive options AWS offers its customers. However, we should first make an important distinction between data archives and data backups – as the purpose and function of the two should not be confused.

A data archive is for data not actively in use, but that needs to be moved to a separate storage device for preservation and retention over the long term. Besides preservation, a key goal of a data archive is to reduce the cost of storage. A data archive is not intended to help your system recover from some disaster or failure. Backups – which are performed on both active and inactive data – are designed to permit recovery from data failure.

Bearing that in mind, being able to quickly restore data from a backup medium is likely far more important than it would be for an archive. Such considerations will define the kind of ideal solutions you might choose for your data archive vs. your data backup.

The data archive: traditional considerations

  • You would need to see far into the future, as the medium you are using today may not exist in ten years. It can therefore be a real challenge to identify a viable long-term storage platform.
  • While archived data may not be currently active, they are generally intended for production use. Therefore, reliable security over long periods of time becomes a critical goal.
  • Data archives tend to grow with time, so you will need to realistically consider future costs and scalable infrastructure needs upfront.
  • Most organizations – and especially governments – are very particular about the availability of archived data. The process of meeting such expectations may lead you to improved disaster recovery strategies, but it can also be really complicated and expensive.
  • Implementation can require significant skills and experience across multiple technologies.

However, many of these concerns simply wouldn’t apply to a data archive in the Cloud. Using AWS, for instance, means you never need to invest in a particular technology or medium, or worry about changing standards – that’s all Amazon’s headache. And your costs will always be a direct product of the services you actually use.

Your data archive and the AWS Cloud

Of course, AWS isn’t the only player offering out-of-the-box cloud archiving services, but they’re a good place to start.

S3 and Glacier are, one way or another, the primary AWS tools you’ll use for your data archive. We’ll look at three common use-case scenarios: archives for AWS-based data, on-premise data, and hybrid data solutions.

1. Applications deployed within AWS

If the application to be archived is running within the AWS environment, then integration with S3 or Glacier should be straightforward. Since a data archive doesn’t demand frequent reads, you would normally opt for the cheaper Glacier, which can require a lag of several hours for retrieval. If, however, you’re already storing some application data in S3 (like videos or application logs) and you may not want to write the extra code needed to move inactive data to Glacier, you may instead consider moving only the old, inactive data from S3 to Glacier.

AWS allows you to configure and manage the automated lifecycle of objects in your S3 buckets. You could therefore create a configuration that causes S3 objects to be moved to Glacier based on specified conditions or policies.

A sample policy may look like this:
Data Archive - sample policy

2. Applications deployed on premises

If the components of your application (like a webserver, database, application server, and NFS server) are running within your datacenter, but you still want to use AWS for archiving your backed up data, the simplest solution is to integrate your backup server with AWS S3 or Glacier. This diagram may help you visualize the architecture:
Data Archive ArchitectureIf you’re already using AWS S3 for your backups instead of a local backup server, then you can use S3 Lifecycle management to quickly add a data archive layer using Glacier to your infrastructure.

Even if your backup server doesn’t natively support AWS cloud integration, you can still create a seamless and secure interface between your data center and AWS’s storage infrastructure using AWS Storage Gateway. Storage Gateway won’t require a dedicated network setup between your corporate network and AWS infrastructure, and it is built to support industry standard storage protocols, while storing the encrypted data in AWS S3.
AWS Storage Gateway

3. Applications deployed in a hybrid setup

In this kind of setup, an application deployed on AWS might interact with on-premise components (or the other way around). In such cases, you may want to extend an existing archiving strategy to the cloud, requiring only a reliable way to connect your two networks via either a standard VPN setup or through AWS Direct Connect, which makes it easy to establish a dedicated network connection from your premises to AWS.

Data archive compliance and regulations

Many customers will have specific data retention policies, and must often comply with regulatory guidelines. AWS Glacier offers you Vault Locks. A Vault Lock Policy allows you to apply compliance controls to the contents of any Glacier vault.

To review, here are some of the key advantages you can enjoy by archiving your data in the cloud…and with AWS in particular:

  • No more need to rely on risky predictions of your data growth and corresponding data storage.
  • Reduced overhead of managing huge data stores for long periods.
  • Reduced cost.
  • Increased availability.
  • No more need to identify and invest in some particular hardware and skills to implement a reliable, long-term archival design.

Do you have your own cloud/local archiving experience? Let us know in the comments.

Avatar

Written by

Vineet Badola

Working as a cloud professional for last 6 years in various organizations, I have experience in three of the most popular cloud platforms, AWS IaaS, Microsoft Azure and Pivotal Cloud Foundry PaaS platform. Having around 10 years of IT experience in various roles and I take great interest in learning and sharing my knowledge on newer technologies. Wore many hats as developer, lead, architect in cloud technologies implementation. During Leisure time I enjoy good soothing music, playing TT and sweating out in Gym. I believe sharing knowledge is my way to make this world a better place.


Related Posts

Joe Nemer
Joe Nemer
— April 3, 2020

Breaking News: All AWS Certification Exams Now Available Online

Remote proctoring for all AWS certifications Cloud Academy is an Advanced AWS Technology Partner, and we are happy to announce all AWS certification exams are available online!  What does this mean for you? You can stay focused on your certification goal. Or you can start a certifica...

Read more
  • AWS
  • AWS certification
  • AWS Certifications
Connie Benton
Connie Benton
— April 1, 2020

How To Build a Career with AWS Certifications

From Iaas and PaaS solutions to digital marketing, cloud computing reshapes the world of technology. As the influence of this technology grows, so does investment. Tens of billions of dollars are being spent on cloud computing-related services each year. This influx is continuing to inc...

Read more
  • AWS
  • Certifications
Vijayakumar Athithan
Vijayakumar Athithan
— March 27, 2020

What is Cognito in AWS?

Web applications usually allow a valid username and password combination for successful sign in to the application. Modern authentication flows incorporate more approaches to ensure user authentication. When using AWS, this is no exception, thanks to the abilities and features offered b...

Read more
  • AWS
  • AWS Cognito
  • Solutions Architect
Avatar
Andrew Larkin
— March 20, 2020

The 12 AWS Certifications: Which is Right for You and Your Team?

As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing. As the market leader and most ma...

Read more
  • AWS
  • AWS Certifications
Alisha Reyes
Alisha Reyes
— March 17, 2020

Cloud Academy’s Blog Digest: How Do AWS Certifications Increase Your Employability, How to Become a Microsoft Certified Azure Data Engineer, and more

With everything going on right now, it's likely that the only thing you've been reading lately is related to the coronavirus pandemic. It's important to stay informed during these times, but it's also good to jump into something that can take your mind off of the current situation for j...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • programming
  • Security
Avatar
Cloud Academy Team
— March 13, 2020

Which Certifications Should I Get?

As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— March 7, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Alisha Reyes
Alisha Reyes
— March 6, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Patrick Navarro
Patrick Navarro
— March 4, 2020

AWS Certifications: How Do They Increase Your Employability and Progress Your Career?

AWS certifications are no walk in the park. They’re designed to validate in-depth, specialist knowledge and comprehensive experience, often requiring months of dedicated studying to earn even for those already working with the cloud platform. But the rewards that AWS professionals ca...

Read more
  • AWS
  • AWS certification
  • certification
Avatar
Chandan Patra
— February 21, 2020

Elasticsearch vs. CloudSearch: AWS Cloud Search Choices

Elasticsearch vs. CloudSearch: What's the main difference? Let's compare AWS-based cloud tools: Elasticsearch vs. CloudSearch. While both services use proven technologies, Elasticsearch is more popular, open source, and has a flexible API to use for customization; in comparison, CloudS...

Read more
  • AWS
  • Azure
  • cloudsearch
  • elasticsearch
Avatar
Andrew Larkin
— February 13, 2020

Cloud Academy Content Roadmap Updates

Welcome to our Q1 2020 roadmap. This is the content we plan to build over the next three months, between February 1 - and April 30, 2020. Let's look at some of our roadmap highlights. Atlassian Bamboo for CI/CD We had a lot of requests for practical guides on how to apply DevOps tool...

Read more
  • Artificial Intelligence
  • AWS
  • Azure
  • Docker
  • Google Cloud Platform
  • Kubernetes
  • Machine Learning
Alisha Reyes
Alisha Reyes
— February 7, 2020

New on Cloud Academy: Git Labs, CKA and CKAD Lab Challenges, AWS and Azure Learning Paths, AGILE, and Much More

We just kicked off our first Free Weekend of 2020. This means we've unlocked our Training Library for just 72 hours. Until Sunday at 11:59 pm (PST), you can get unlimited access to our industry-leading learning paths, courses, certification prep exams, and our most popular hands-on labs...

Read more
  • agile
  • AWS
  • Azure
  • Google Cloud Platform
  • Linux
  • OWASP
  • programming
  • red hat
  • scrum