Cloud Academy Team

April 26, 2018

Top Cloud Skills in Demand for 2018: Big Data, AI, Machine Learning

Cloud is a pathway to innovation. Where yesterday’s cloud deployments were about moving an on-premises infrastructure in your data center to a cloud environment, companies today are using cloud platforms to build new features for their products and services that are integrated at a software level.
Artificial intelligence, machine learning, and big data capabilities are no longer nice to haves—they’re an essential part of an enterprise’s growth strategy. Cloud makes it easier and more cost-effective to leverage such technologies.

As if the need for more advanced skills wasn’t enough, the tendency for a multi-cloud approach means that teams will need to know how to use services from multiple platforms—chief among them AWS, Microsoft Azure, and Google Cloud Platform—to stay competitive. As always, for any new services that you adopt, strong security practices—and the skills to implement them—must be part of the process.

For companies looking to incorporate big data, AI, and machine learning into enterprise applications, here are some of the top cloud skills in demand that your teams will need to learn to stay competitive.

Big Data

Data is among an organization’s most important assets. The cloud helps companies leverage a variety of tools for processing, analyzing, and managing high-volume, diverse data sets without the time or capital investment required for an on-premises deployment.
AWS, Azure, and Google all have big data services for analysis, visualization, processing, and administration, and they are continuing to add new features and services to reduce complexity and time to value.

As companies ramp up their big data processing efforts, they will likely have two issues. First, building a scalable big data analytics infrastructure is time-consuming and expensive. Second, many people with data analytics backgrounds are used to writing queries in SQL, but most big data processing systems are based on Hadoop MapReduce and similar frameworks, which makes writing queries more difficult.

Azure Data Lake Analytics and Azure Stream Analytics solve both of these problems. First, they take care of all of the infrastructure requirements behind the scenes and allow you to run them on a pay-as-you-go basis. Second, they both support variants of SQL for writing queries, making it much easier for employees to start performing big data analytics. Azure also offers its HDInsight service for deploying and provisioning Hadoop clusters to manage big data. The Azure Data Lake Store, which is where Data Lake Analytics draws its data, implements the Hadoop File System (HDFS) standard so existing Hadoop clusters can migrate to Data Lake Store for managed, unlimited storage. Try out both Azure Data Lake Analytics and Azure Stream Analytics for yourself in Cloud Academy’s Lab environment.

Because cost is a main feature of big data deployments—storing large volumes of data doesn’t come cheap—teams will want to know how to store data and run queries in the most cost effective way. Google’s BigQuery is a massive, lightning-fast data warehouse in the cloud that you can use to process billions of rows of data in seconds.

Depending on where your big data originates, you might need to consider the implications of delivering it to and processing it within the cloud. For example, if you have a significant amount of data collected by IoT devices, potentially billions of data points per day, you’ll need to take into account network transfer costs, latency, aggregation, etc. all to ensure that your big data deployment is both cost effective and efficient in terms of getting timely results.

With Google services such as Cloud Dataflow and Cloud Dataproc you can build data processing pipelines that transform and summarize your big data using Apache Beam, Hadoop, and Spark.

If you’re using AWS for big data, you’ll want to understand the basics of how it manages everything from collection and storage to processing and data security. Amazon’s Kinesis Data Analytics for streaming data, Elastic MapReduce (EMR) for processing, Redshift for storage, Athena for analysis, and Quicksight for business intelligence are just some of the services that you can use together to create big data solutions in AWS.

AI & Machine Learning

All of the major cloud vendors are developing services that allow companies to quickly leverage AI or machine learning in their applications. This is a significant shift compared to the early days of cloud computing: companies are now relying on cloud vendors to build critical components of their software, enriching their capabilities with technology that would otherwise take years (and significant capital) to develop.

Time to value is an important component of this shift. We are seeing developers integrate AI and machine learning technologies to provide insight in hours—something that would have been impossible with that level of technology just three years ago.

A Tech Pro Research survey reported that 42% of companies don’t have the skills for implementing and supporting AI technologies in house. As enterprise teams look to leverage these technologies for analytics and business intelligence, they will look accordingly to cloud teams to make these capabilities available.

Here are a few services to get started using AI and machine learning in the cloud:

New to machine learning? Fit your business case with the right machine learning model, and practice making real-time predictions by training a neural network with Amazon Machine Learning.
Give teams some machine learning experience so they’re ready to get their hands dirty. Add AI to your applications by integrating object/feature recognition with Amazon Rekognition and conversational interfaces with Amazon Lex.
Azure cloud user? Get an overview of the primary Azure machine learning tools: Azure Machine Learning Studio for training and deploying machine learning models and Azure Machine Learning Workbench, a toolkit for building machine learning models. It’s worth highlighting here that ML Studio’s drag and drop interface makes it easy for teams to build models without writing any code.
Get started using the popular TensorFlow framework on Google Cloud Machine Learning Engine to train your first neural network, and deploy the trained model to make predictions. Then try TensorFlow on the Amazon Deep Learning AMI.

In the cloud, machine learning and big data often go hand in hand. Machine learning models are trained by analyzing and finding patterns, relationships, and associations buried within a dataset. The bigger the dataset, the more analysis can be performed. Processing a larger training dataset with the right quality features, will result in a more refined training model, which in turn ultimately provides for better predictions and inference.

To support this connection, cloud vendors provide seamless integrations between their big data services, and their machine learning services. For example, Azure’s Stream Analytics can directly call Azure Machine Learning models for processing high-velocity streaming data. Likewise, Azure Machine Learning can query Azure Data Lake Store as a model data source. Similarly, Amazon Kinesis Data Analytics supports several machine learning algorithms exposed directly as SQL functions, and can integrate with other Amazon machine learning services. Teams should familiarize themselves with these complementary and compatible services to deploy them most effectively.

Security

Security should underpin everything you do in the cloud. As more data goes into the cloud, the pressure is on for teams to be able to keep workloads and applications safe. As recent breaches have exposed, many companies are rushing to the cloud without employing even basic security measures like encryption for storage. According to McAfee, 36% of organizations are adopting cloud even without the right security skills.

Strong security practices start with a full understanding of how security works in each platform and the skills to properly use the available services in your environment. For all of the platforms you’re using, you’ll want to have a thorough understanding of their shared responsibility model so that you understand where the platform’s security responsibility stops and where yours begins.

AWS, Microsoft Azure, and Google Cloud Platform offer a range of services and tools that teams can use to design, implement, and architect the proper level of security. And, they have been quick to release innovative services that teams can employ to protect data and applications in the cloud. In November, AWS released GuardDuty, an intelligent threat detection service that uses artificial intelligence and machine learning to detect suspicious activity.

One of the first security services that you will use in a cloud environment will be an identity and access management service (IAM), which allows you to configure specific access controls within your environment. Learn about Microsoft’s IAM service in the cloud, Azure Active Directory (Azure AD) and how AWS handles IAM.

Encryption should be used to protect data at rest and in-flight when using any storage service. On AWS, its Elastic Block Store (EBS) service provides persistent block-level storage volumes for Amazon EC2 instances. If those volumes contain sensitive information (as they most likely do), you’ll want to make sure data on the volume is encrypted to protect it from malicious activity. Learn about the encryption options for big data storage services, including S3 and Amazon Athena, in AWS.

With GDPR looming, compliance is on the radar for many organizations who are trying to educate themselves and put privacy best practices in place at the same time.

AWS offers a number of services that can help you architect compliance into your applications. AWS Config enables visibility of your entire AWS infrastructure from a configuration perspective; with CloudTrail, you can capture all API calls by users and/or services for audit and compliance purposes; with Inspector, you have a record of your security assessments for EC2 instances and any applications running on them. Finally, AWS Trusted Advisor recommends improvements across your AWS account to help optimize your environment based on AWS best practices for security, fault tolerance, costs, and performance.

Top Cloud Skills in Demand for 2018: What’s Next?

The velocity and variety of changes in the broader cloud computing industry will continue unabated in 2018 and beyond. Explore the learning paths, video courses, and hands-on labs available in our Content Library. Subscribe to the Cloud Academy blog, where we’ll keep you up to date with best practices and how-tos for AWS, Azure, and Google Cloud services.

Top Cloud Skills in Demand for 2018: Big Data, AI, Machine Learning

What is Azure Data Factory: Key Components and Concepts, Use Cases, Pricing, and More

What is Data Engineering? Skills, Tools, and Certifications

New AWS re:Invent Announcements: Swami Sivasubramanian Keynote

AWS Machine Learning Labs and Certification Preparation

New Content: AWS Terraform, Java Programming Lab Challenges, Azure DP-900 & DP-300 Certification Exam Prep, Plus Plenty More Amazon, Google, Microsoft, and Big Data Courses

New Content: AWS Data Analytics – Specialty Certification, Azure AI-900 Certification, Plus New Learning Paths, Courses, Labs, and More

New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More

New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More

New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More

New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More

OWASP Top 10 Vulnerabilities

AI and Machine Learning: How They Are Changing the Content Industry