A Comparison of Machine Learning Services on AWS, Azure, and Google Cloud
Artificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to believe that much of it will happen in the cloud.
The top cloud computing platforms are all betting big on democratizing artificial intelligence. Over the past three years, Amazon, Google, and Microsoft have made significant investments in artificial intelligence (AI) and machine learning, from rolling out new services to carrying out major reorganizations that place AI strategically in their organizational structures. Google CEO, Sundar Pichai, has even said that his company is shifting to an “AI-first” world.
So, if the cloud is the destination for your machine learning projects, how do you know which platform is right for you? In this post, we’ll explore the machine learning offerings from Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
What are the Benefits of Machine Learning in the Cloud?
- The cloud’s pay-per-use model is good for bursty AI or machine learning workloads.
- The cloud makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand increases.
- The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.
- AWS, Microsoft Azure, and Google Cloud Platform offer many machine learning options that don’t require deep knowledge of AI, machine learning theory, or a team of data scientists.
You don’t need to use a cloud provider to build a machine learning solution. After all, there are plenty of open source machine learning frameworks, such as TensorFlow, MXNet, and CNTK that companies can run on their own hardware. However, companies building sophisticated machine learning models in-house are likely to run into issues scaling their workloads, because training real-world models typically requires large compute clusters.
The barriers to entry for bringing machine learning capabilities to enterprise applications are high on many fronts. The specialized skills required to build, train, and deploy machine learning models and the computational and special-purpose hardware requirements add up to higher costs for labor, development, and infrastructure.
These are problems that cloud computing can solve and the leading public cloud platforms are on a mission to make it easier for companies to leverage machine learning capabilities to solve business problems without the full tech burden. As AWS CEO Andy Jassy highlighted in his 2017 re:Invent keynote, his company has to “solve the problem of accessibility of everyday developers and scientists” to enable AI and machine learning in the enterprise.
There are many good reasons for moving some, or all, of your machine learning projects to the cloud. The cloud’s pay-per-use model is good for bursty AI or machine learning workloads, and you can leverage the speed and power of GPUs for training without the hardware investment. The cloud also makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand for those features increases.
Perhaps even more importantly, the cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science—skills that are rare and in short supply. A survey by Tech Pro Research found that just 28% of companies have some experience with AI or machine learning, and 42% said their enterprise IT personnel don’t have the skills required to implement and support AI and machine learning.
AWS, Microsoft Azure, and Google Cloud Platform offer many options for implementing intelligent features in enterprise applications that don’t require deep knowledge of AI or machine learning theory or a team of data scientists.
The Spectrum of Cloud Machine Learning Services
It’s helpful to consider each provider’s offerings on the spectrum of general-purpose services with high flexibility at one end and special-purpose services with high ease-of-use at the other.
For example, Google Cloud ML Engine is a general-purpose service that requires you to write code using Python and the TensorFlow libraries, while Amazon Rekognition is a specialized image-recognition service that you can run with a single command. So, if you have a typical requirement, such as video analysis, then you should use a specialized service. If your requirement is outside the scope of specialized services, then you’ll have to write custom code and run it on a general-purpose service.
It’s worth noting that all three of the major cloud providers have also attempted to create general-purpose services that are relatively easy to use. Examples include the Google Prediction API, Amazon Machine Learning, and Azure Machine Learning Studio. They fall somewhere in the middle of the spectrum. At first, it might seem like this type of service would give you the best of both worlds, since you could create custom machine learning applications without having to write complex code. However, the cloud providers discovered that there isn’t a big market for simple, general-purpose machine learning. Why? They’re not flexible enough to handle most custom requirements and they’re more difficult to use than specialized services.
In fact, Google has discontinued its Prediction API and Amazon ML is no longer even listed on the “Machine Learning on AWS” web page. However, Azure Machine Learning Studio is still an interesting service in this category, because it’s a great way to learn how to build machine learning models for those who are new to the field. It has a drag-and-drop interface that doesn’t require any coding (although you can add code if you want to). It supports a wide variety of algorithms, including different types of regression, classification, and anomaly detection, as well as a clustering algorithm for unsupervised learning. Once you have a better understanding of machine learning, though, you’re probably better off using a tool like Azure Machine Learning Workbench, which is more difficult to use, but provides more flexibility.
What AI Tools Should I Use?
If you are implementing AI for the first time, then you should start with one of the specialized services. Designed as standalone applications or APIs on top of pre-trained models, each platform offers a range of specialty services that allow developers to add intelligent capabilities without training or deploying their own machine learning models. The main offerings in this category are primarily focused on some aspect of either image or language processing.
|Image Recognition||Rekognition Image||Computer Vision API
Custom Vision Service
|Video Analysis||Rekognition Video||Computer Vision API
|Video Intelligence API|
|Speech to Text||Transcribe||Bing Speech API
Custom Speech Service
Speaker Recognition API
|Text to Speech||Polly||Bing Speech API||Text-to-Speech API|
|Translation||Translate||Translator Text API||Translation API|
|Language Analysis||Comprehend||Text Analytics API
Web Language Model API
Linguistic Analysis API
|Natural Language API|
|Chatbot||Lex||Azure Bot Service||Dialogflow|
This list highlights Azure’s strategy of splitting products into separately branded, very specific AI tasks. Most of these features are also offered by Amazon and Google, but as part of broader APIs. As you can see in the chart, all three of the vendors offer essentially the same capabilities. Microsoft and Google do have a few unique offerings, though. For example, Azure Custom Decision Service helps personalize content and Google Cloud Talent Solution helps with the recruiting process.
Which General AI Offerings Should I Consider?
General-purpose machine learning offerings are used to train and deploy machine learning models. Since specialized AI services only cover a narrow subset of uses, such as image and language processing, you’ll need to use a general-purpose machine learning (ML) service for everything else. For example, many companies need product recommendation engines and fraud detection for their ecommerce sites. These applications require custom machine learning models.
- 12 common machine learning algorithms
- TensorFlow and MXNet pre-installed
- Can use other ML frameworks
Google Cloud ML Engine:
- Supports TensorFlow (as well as scikit-learn and XGBoost in beta)
Azure Machine Learning Workbench & Machine Learning Services:
- Supports Python-based machine learning frameworks, such as TensorFlow or PyTorch
Amazon SageMaker and Cloud ML Engine are purely cloud-based services, while Azure Machine Learning Workbench is a desktop application that uses cloud-based machine learning services.
Amazon SageMaker is described by AWS as a “fully managed, end to end machine learning service” that is designed to be a fast and easy way to add machine learning capabilities. In addition to the AWS Gluon machine learning library, SageMaker supports TensorFlow, MXNet, and many other machine learning frameworks. It was launched in November 2017 at the annual AWS re:Invent conference.
Google released its Cloud ML Engine in 2016, making it easier for developers with some machine learning experience to train models. Google created the popular open-source TensorFlow machine learning framework, which is currently the only framework that Cloud ML Engine supports (although it now offers beta support for scikit-learn and XGBoost). Both Amazon and Azure support TensorFlow and several other machine learning frameworks.
In addition to its older Machine Learning Studio, Azure has two separate machine learning services. The Experimentation Service is designed for model training and deployment, while the Model Management Service provides a registry of model versions and makes it possible to deploy trained models as Docker containerized services. Machine Learning Workbench is a desktop-based frontend for these two services.
How is Hardware Impacted by Machine Learning Workloads?
- Machine learning workloads require greater processing power
- The amount of processing required could be expensive
- GPUs are the processor of choice for many ML workloads because they significantly reduce processing time
- Google and other companies are creating hardware that’s optimized for machine learning jobs
- To help people get started with AI, Amazon offers a camera that can run deep learning models
Hardware is an important consideration when it comes to machine learning workloads. Training a model to recognize a pattern or understand speech requires major parallel computing resources, which could take days on traditional CPU-based processors. In comparison, powerful graphics processing units (GPUs) are the processor of choice for many AI and machine learning workloads because they significantly reduce processing time.
AWS, Azure, and Google Cloud all support using either regular CPUs or GPUs to train models. Google has a unique offering with its Cloud TPUs (Tensor Processing Units). These chips are designed to speed up machine learning tasks. Not surprisingly, they work with TensorFlow. Many other companies are now racing to catch up with Google and release their own ML-optimized hardware.
Outside of processing, AWS has several unique offerings in the hardware category. Its AWS DeepLens wireless video camera can run deep learning models on what it sees and perform image recognition in real time. Amazon seems to be promoting client-side processing as an easy way to get started learning about machine learning.
Although not strictly hardware, the AWS Greengrass ML Inference service allows you to perform machine learning inference processing on your own hardware that’s AWS Greengrass-enabled. Better still, you can keep using the extensive GPU compute power in the cloud to train your machine learning models, then deploy the outcomes to your own devices running AWS Greengrass ML Inference. Running ML Inference locally reduces the amount of device data to be transmitted to the cloud, and therefore reduces costs and latency of results.
What are the Open Source Standards Machine Learning Platforms Use?
Each platform’s deep learning offerings and their positions on wider industry-level machine learning initiatives, open standards, and so forth are a good indication of what the future holds. Deep learning offerings, in particular, highlight how the space has achieved a balance between competition and cooperation among providers.
- Google created an open-sourced TensorFlow, which has become widely popular among machine learning enthusiasts. Despite its connection to Google, both Amazon and Microsoft support TensorFlow in their deep learning services as well.
- Amazon has thrown its support behind Apache MXNet, advocating it as the company’s weapon of choice for machine learning and actively promoting it both internally and externally. MXNet underpins several of its machine learning and AI services.
- Microsoft provides CNTK, otherwise known as the Microsoft Cognitive Toolkit, for deep learning at the commercial level.
- AWS and Microsoft have jointly created the Gluon specification, which is a higher-level abstraction for developing machine learning models. Gluon currently supports MXNet and will soon be extended to CNTK. The Gluon interface simplifies the development experience and is aimed at winning over new developers early in their machine learning journey.
- ONNX, the Open Neural Network Exchange from Facebook and Microsoft, is aimed at creating transferable machine learning models. With ONNX, you create your machine learning model in an open format that allows it to then be trained on supported machine learning frameworks. ONNX has the support of both AWS and Microsoft, but Google has yet to come on board.
Benefits of Machine Learning in the Cloud: Conclusion
Since Azure, Google Cloud, and AWS all provide good general-purpose and specialized machine learning services, you will probably want to choose the platform that you’ve already chosen for your other cloud services. However, to avoid vendor lock-in when using a general-purpose service, you may want to use an open-source machine learning framework that is supported by all three vendors. At the moment, the framework with the broadest support is TensorFlow, although the field is changing rapidly, so we expect cross-platform support for more frameworks soon. The main holdout is Google, which previously supported only TensorFlow, but even Google is now introducing support for scikit-learn and XGBoost.
The Cloud Academy library includes machine learning courses for all three platforms, most of which contain examples using TensorFlow or scikit-learn. Some of the learning paths on this subject include:
- Introduction to Machine Learning on AWS
- Applying Machine Learning and AI Services on AWS
- Introduction to Azure Machine Learning
- Machine Learning on Google Cloud Platform
- The AWS and Azure learning paths also include hands-on labs so you can practice your skills.
We’re regularly adding new machine learning content to our library, based on what our customers need, so try the learning paths above and then let us know what else you would like to see. Good luck with your machine learning efforts!
Written by Guy Hummel and Jeremy Cook
New Content: AWS Terraform, Java Programming Lab Challenges, Azure DP-900 & DP-300 Certification Exam Prep, Plus Plenty More Amazon, Google, Microsoft, and Big Data Courses
This month our Content Team continues building the catalog of courses for everyone learning about AWS, GCP, and Microsoft Azure. In addition, this month’s updates include several Java programming lab challenges and a couple of courses on big data. In total, we released five new learning...
Where Should You Be Focusing Your AWS Security Efforts?
Another day, another re:Invent session! This time I listened to Stephen Schmidt’s session, “AWS Security: Where we've been, where we're going.” Amongst covering the highlights of AWS security during 2020, a number of newly added AWS features/services were discussed, including: AWS Audit...
AWS re:Invent: 2020 Keynote Top Highlights and More
We’ve gotten through the first five days of the special all-virtual 2020 edition of AWS re:Invent. It’s always a really exciting time for practitioners in the field to see what features and services AWS has cooked up for the year ahead. This year’s conference is a marathon and not a...
WARNING: Great Cloud Content Ahead
At Cloud Academy, content is at the heart of what we do. We work with the world’s leading cloud and operations teams to develop video courses and learning paths that accelerate teams and drive digital transformation. First and foremost, we listen to our customers’ needs and we stay ahea...
Excelling in AWS, Azure, and Beyond – How Danut Prisacaru Prepares for the Future
Meet Danut Prisacaru. Danut has been a Software Architect for the past 10 years and has been involved in Software Engineering for 30 years. He’s passionate about software and learning, and jokes that coding is basically the only thing he can do well (!). We think his enthusiasm shines t...
New Content: AWS Data Analytics – Specialty Certification, Azure AI-900 Certification, Plus New Learning Paths, Courses, Labs, and More
This month our Content Team released two big certification Learning Paths: the AWS Certified Data Analytics - Speciality, and the Azure AI Fundamentals AI-900. In total, we released four new Learning Paths, 16 courses, 24 assessments, and 11 labs. New content on Cloud Academy At any ...
New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More
This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...
AWS Certification Practice Exam: What to Expect from Test Questions
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 12 certifications that cover major cloud roles including Solutions Architect, De...
Overcoming Unprecedented Business Challenges with AWS
From auto-scaling applications with high availability to video conferencing that’s used by everyone, every day — cloud technology has never been more popular or in-demand. But what does this mean for experienced cloud professionals and the challenges they face as they carve out a new p...
Constant Content: Cloud Academy’s Q3 2020 Roadmap
Hello — Andy Larkin here, VP of Content at Cloud Academy. I am pleased to release our roadmap for the next three months of 2020 — August through October. Let me walk you through the content we have planned for you and how this content can help you gain skills, get certified, and...
New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More
This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...
Blog Digest: Which Certifications Should I Get?, The 12 Microsoft Azure Certifications, 6 Ways to Prevent a Data Breach, and More
This month, we were excited to announce that Cloud Academy was recognized in the G2 Summer 2020 reports! These reports highlight the top-rated solutions in the industry, as chosen by the source that matters most: customers. We're grateful to have been nominated as a High Performer in se...