A New Paradigm for Cloud Training is Needed (and Other Insights We Can Democratize)
It’s no secret that cloud, its supporting technologies, and the capabilities it unlocks is disrupting IT. Whether you’re cloud-first, multi-cloud, ...Learn More
Cloud is a pathway to innovation. Where yesterday’s cloud deployments were about moving an on-premises infrastructure in your data center to a cloud environment, companies today are using cloud platforms to build new features for their products and services that are integrated at a software level.
Artificial intelligence, machine learning, and big data capabilities are no longer nice to haves—they’re an essential part of an enterprise’s growth strategy. Cloud makes it easier and more cost-effective to leverage such technologies.
As if the need for more advanced skills wasn’t enough, the tendency for a multi-cloud approach means that teams will need to know how to use services from multiple platforms—chief among them AWS, Microsoft Azure, and Google Cloud Platform—to stay competitive. As always, for any new services that you adopt, strong security practices—and the skills to implement them—must be part of the process.
For companies looking to incorporate big data, AI, and machine learning into enterprise applications, here are some of the top cloud skills in demand that your teams will need to learn to stay competitive.
Data is among an organization’s most important assets. The cloud helps companies leverage a variety of tools for processing, analyzing, and managing high-volume, diverse data sets without the time or capital investment required for an on-premises deployment.
AWS, Azure, and Google all have big data services for analysis, visualization, processing, and administration, and they are continuing to add new features and services to reduce complexity and time to value.
As companies ramp up their big data processing efforts, they will likely have two issues. First, building a scalable big data analytics infrastructure is time-consuming and expensive. Second, many people with data analytics backgrounds are used to writing queries in SQL, but most big data processing systems are based on Hadoop MapReduce and similar frameworks, which makes writing queries more difficult.
Azure Data Lake Analytics and Azure Stream Analytics solve both of these problems. First, they take care of all of the infrastructure requirements behind the scenes and allow you to run them on a pay-as-you-go basis. Second, they both support variants of SQL for writing queries, making it much easier for employees to start performing big data analytics. Azure also offers its HDInsight service for deploying and provisioning Hadoop clusters to manage big data. The Azure Data Lake Store, which is where Data Lake Analytics draws its data, implements the Hadoop File System (HDFS) standard so existing Hadoop clusters can migrate to Data Lake Store for managed, unlimited storage. Try out both Azure Data Lake Analytics and Azure Stream Analytics for yourself in Cloud Academy’s Lab environment.
Because cost is a main feature of big data deployments—storing large volumes of data doesn’t come cheap—teams will want to know how to store data and run queries in the most cost effective way. Google’s BigQuery is a massive, lightning-fast data warehouse in the cloud that you can use to process billions of rows of data in seconds.
Depending on where your big data originates, you might need to consider the implications of delivering it to and processing it within the cloud. For example, if you have a significant amount of data collected by IoT devices, potentially billions of data points per day, you’ll need to take into account network transfer costs, latency, aggregation, etc. all to ensure that your big data deployment is both cost effective and efficient in terms of getting timely results.
With Google services such as Cloud Dataflow and Cloud Dataproc you can build data processing pipelines that transform and summarize your big data using Apache Beam, Hadoop, and Spark.
If you’re using AWS for big data, you’ll want to understand the basics of how it manages everything from collection and storage to processing and data security. Amazon’s Kinesis Data Analytics for streaming data, Elastic MapReduce (EMR) for processing, Redshift for storage, Athena for analysis, and Quicksight for business intelligence are just some of the services that you can use together to create big data solutions in AWS.
All of the major cloud vendors are developing services that allow companies to quickly leverage AI or machine learning in their applications. This is a significant shift compared to the early days of cloud computing: companies are now relying on cloud vendors to build critical components of their software, enriching their capabilities with technology that would otherwise take years (and significant capital) to develop.
Time to value is an important component of this shift. We are seeing developers integrate AI and machine learning technologies to provide insight in hours—something that would have been impossible with that level of technology just three years ago.
A Tech Pro Research survey reported that 42% of companies don’t have the skills for implementing and supporting AI technologies in house. As enterprise teams look to leverage these technologies for analytics and business intelligence, they will look accordingly to cloud teams to make these capabilities available.
Here are a few services to get started using AI and machine learning in the cloud:
In the cloud, machine learning and big data often go hand in hand. Machine learning models are trained by analyzing and finding patterns, relationships, and associations buried within a dataset. The bigger the dataset, the more analysis can be performed. Processing a larger training dataset with the right quality features, will result in a more refined training model, which in turn ultimately provides for better predictions and inference.
To support this connection, cloud vendors provide seamless integrations between their big data services, and their machine learning services. For example, Azure’s Stream Analytics can directly call Azure Machine Learning models for processing high-velocity streaming data. Likewise, Azure Machine Learning can query Azure Data Lake Store as a model data source. Similarly, Amazon Kinesis Data Analytics supports several machine learning algorithms exposed directly as SQL functions, and can integrate with other Amazon machine learning services. Teams should familiarize themselves with these complementary and compatible services to deploy them most effectively.
Security should underpin everything you do in the cloud. As more data goes into the cloud, the pressure is on for teams to be able to keep workloads and applications safe. As recent breaches have exposed, many companies are rushing to the cloud without employing even basic security measures like encryption for storage. According to McAfee, 36% of organizations are adopting cloud even without the right security skills.
Strong security practices start with a full understanding of how security works in each platform and the skills to properly use the available services in your environment. For all of the platforms you’re using, you’ll want to have a thorough understanding of their shared responsibility model so that you understand where the platform’s security responsibility stops and where yours begins.
AWS, Microsoft Azure, and Google Cloud Platform offer a range of services and tools that teams can use to design, implement, and architect the proper level of security. And, they have been quick to release innovative services that teams can employ to protect data and applications in the cloud. In November, AWS released GuardDuty, an intelligent threat detection service that uses artificial intelligence and machine learning to detect suspicious activity.
One of the first security services that you will use in a cloud environment will be an identity and access management service (IAM), which allows you to configure specific access controls within your environment. Learn about Microsoft’s IAM service in the cloud, Azure Active Directory (Azure AD) and how AWS handles IAM.
Encryption should be used to protect data at rest and in-flight when using any storage service. On AWS, its Elastic Block Store (EBS) service provides persistent block-level storage volumes for Amazon EC2 instances. If those volumes contain sensitive information (as they most likely do), you’ll want to make sure data on the volume is encrypted to protect it from malicious activity. Learn about the encryption options for big data storage services, including S3 and Amazon Athena, in AWS.
With GDPR looming, compliance is on the radar for many organizations who are trying to educate themselves and put privacy best practices in place at the same time.
AWS offers a number of services that can help you architect compliance into your applications. AWS Config enables visibility of your entire AWS infrastructure from a configuration perspective; with CloudTrail, you can capture all API calls by users and/or services for audit and compliance purposes; with Inspector, you have a record of your security assessments for EC2 instances and any applications running on them. Finally, AWS Trusted Advisor recommends improvements across your AWS account to help optimize your environment based on AWS best practices for security, fault tolerance, costs, and performance.
The velocity and variety of changes in the broader cloud computing industry will continue unabated in 2018 and beyond. Explore the learning paths, video courses, and hands-on labs available in our Content Library. Subscribe to the Cloud Academy blog, where we’ll keep you up to date with best practices and how-tos for AWS, Azure, and Google Cloud services.
Explore the newest learning paths, courses, and hands-on labs on Cloud Academy in November.Learning PathsIntroduction to DC/OSIn an enterprise environment, running multiple workload types simultaneously can be both difficult and costly, especially when servers aren’t being used ...
Explore the newest Learning Paths, Courses, and Hands-on Labs on Cloud Academy in September.Learning Paths and CoursesCertified Big Data Specialty on AWS Solving problems and identifying opportunities starts with data. The ability to collect, store, retrieve, and analyze data me...
This week on Cloud Academy, we’ve added new learning paths and hands-on labs in networking, serverless, big data, storage, and other cloud services that you need to know about in AWS, Azure, and Google Cloud Platform.Learning PathsAWS Network Specialty Certification ExamAdvanced...
The availability of so much data is one of the greatest gifts of our day. But how does this impact a business when it’s transitioning to the cloud? Will your historic on-premise data be a hindrance if you’re looking to move to the cloud? What is Azure Data Factory? Is it possible to enr...
AWS Lambda is one of the best solutions for managing a data collection pipeline and for implementing a serverless architecture. In this post, we'll discover how to build a serverless data pipeline in three simple steps using AWS Lambda Functions, Kinesis Streams, Amazon Simple Queue Ser...
Organizations must deal with the collection and storage of continuously-growing data, and then harvest it to capture value. "Big Data," as its called, concerns itself with these complex processes.The following list contains 46 key Big Data terms that you're likely going to find in t...
Like a jigsaw puzzle, there are many components in the AWS big data ecosystem. Read this article and see how the components fit together to form a beautiful whole.If you are a data engineer, wouldn’t it be great if you could easily scale your existing infrastructure on-demand to sup...
How can Azure HDInsight solve your big data challenges?Big data refers to large volumes of fast-moving data in any format that haven't yet been handled by your traditional data processing system. In other words, it refers to data which have Volume, Variety and Velocity (commonly terme...
In the first article about Amazon EMR, in our two-part series, we learned to install Apache Spark and Apache Zeppelin on Amazon EMR. We also learned ways of using different interactive shells for Scala, Python, and R, to program for Spark.Let's continue with the final part of this s...
Amazon EMR (Elastic MapReduce) provides a platform to provision and manage Amazon EC2-based data processing clusters.Amazon EMR clusters are installed with different supported projects in the Apache Hadoop and Apache Spark ecosystems. You can either choose to install from a predefined...
Azure Data Lake Analytics simplifies the management of big data processing using integrated Azure resource infrastructure and complex code.We've previously discussed Azure Data Lake and Azure Data Lake Store. That post should provide you with a good foundation for understanding Azure ...
(Update March 2019) To get a definition of the roles needed to maximize your organization's investment in cloud, explore the latest skills in demand by job role with Cloud Academy's Cloud Roster™.Cloud Academy is always on the lookout for the most promising Cloud Computing opportuni...