A Comparison of Machine Learning Services on AWS, Azure, and Google Cloud
Artificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to believe that much of it will happen in the cloud.
The top cloud computing platforms are all betting big on democratizing artificial intelligence. Over the past three years, Amazon, Google, and Microsoft have made significant investments in artificial intelligence (AI) and machine learning, from rolling out new services to carrying out major reorganizations that place AI strategically in their organizational structures. Google CEO, Sundar Pichai, has even said that his company is shifting to an “AI-first” world.
So, if the cloud is the destination for your machine learning projects, how do you know which platform is right for you? In this post, we’ll explore the machine learning offerings from Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
What are the Benefits of Machine Learning in the Cloud?
- The cloud’s pay-per-use model is good for bursty AI or machine learning workloads.
- The cloud makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand increases.
- The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.
- AWS, Microsoft Azure, and Google Cloud Platform offer many machine learning options that don’t require deep knowledge of AI, machine learning theory, or a team of data scientists.
You don’t need to use a cloud provider to build a machine learning solution. After all, there are plenty of open source machine learning frameworks, such as TensorFlow, MXNet, and CNTK that companies can run on their own hardware. However, companies building sophisticated machine learning models in-house are likely to run into issues scaling their workloads, because training real-world models typically requires large compute clusters.
The barriers to entry for bringing machine learning capabilities to enterprise applications are high on many fronts. The specialized skills required to build, train, and deploy machine learning models and the computational and special-purpose hardware requirements add up to higher costs for labor, development, and infrastructure.
These are problems that cloud computing can solve and the leading public cloud platforms are on a mission to make it easier for companies to leverage machine learning capabilities to solve business problems without the full tech burden. As AWS CEO Andy Jassy highlighted in his 2017 re:Invent keynote, his company has to “solve the problem of accessibility of everyday developers and scientists” to enable AI and machine learning in the enterprise.
There are many good reasons for moving some, or all, of your machine learning projects to the cloud. The cloud’s pay-per-use model is good for bursty AI or machine learning workloads, and you can leverage the speed and power of GPUs for training without the hardware investment. The cloud also makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand for those features increases.
Perhaps even more importantly, the cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science—skills that are rare and in short supply. A survey by Tech Pro Research found that just 28% of companies have some experience with AI or machine learning, and 42% said their enterprise IT personnel don’t have the skills required to implement and support AI and machine learning.
AWS, Microsoft Azure, and Google Cloud Platform offer many options for implementing intelligent features in enterprise applications that don’t require deep knowledge of AI or machine learning theory or a team of data scientists.
The Spectrum of Cloud Machine Learning Services
It’s helpful to consider each provider’s offerings on the spectrum of general-purpose services with high flexibility at one end and special-purpose services with high ease-of-use at the other.
For example, Google Cloud ML Engine is a general-purpose service that requires you to write code using Python and the TensorFlow libraries, while Amazon Rekognition is a specialized image-recognition service that you can run with a single command. So, if you have a typical requirement, such as video analysis, then you should use a specialized service. If your requirement is outside the scope of specialized services, then you’ll have to write custom code and run it on a general-purpose service.
It’s worth noting that all three of the major cloud providers have also attempted to create general-purpose services that are relatively easy to use. Examples include the Google Prediction API, Amazon Machine Learning, and Azure Machine Learning Studio. They fall somewhere in the middle of the spectrum. At first, it might seem like this type of service would give you the best of both worlds, since you could create custom machine learning applications without having to write complex code. However, the cloud providers discovered that there isn’t a big market for simple, general-purpose machine learning. Why? They’re not flexible enough to handle most custom requirements and they’re more difficult to use than specialized services.
In fact, Google has discontinued its Prediction API and Amazon ML is no longer even listed on the “Machine Learning on AWS” web page. However, Azure Machine Learning Studio is still an interesting service in this category, because it’s a great way to learn how to build machine learning models for those who are new to the field. It has a drag-and-drop interface that doesn’t require any coding (although you can add code if you want to). It supports a wide variety of algorithms, including different types of regression, classification, and anomaly detection, as well as a clustering algorithm for unsupervised learning. Once you have a better understanding of machine learning, though, you’re probably better off using a tool like Azure Machine Learning Workbench, which is more difficult to use, but provides more flexibility.
What AI Tools Should I Use?
If you are implementing AI for the first time, then you should start with one of the specialized services. Designed as standalone applications or APIs on top of pre-trained models, each platform offers a range of specialty services that allow developers to add intelligent capabilities without training or deploying their own machine learning models. The main offerings in this category are primarily focused on some aspect of either image or language processing.
|Image Recognition||Rekognition Image||Computer Vision API
Custom Vision Service
|Video Analysis||Rekognition Video||Computer Vision API
|Video Intelligence API|
|Speech to Text||Transcribe||Bing Speech API
Custom Speech Service
Speaker Recognition API
|Text to Speech||Polly||Bing Speech API||Text-to-Speech API|
|Translation||Translate||Translator Text API||Translation API|
|Language Analysis||Comprehend||Text Analytics API
Web Language Model API
Linguistic Analysis API
|Natural Language API|
|Chatbot||Lex||Azure Bot Service||Dialogflow|
This list highlights Azure’s strategy of splitting products into separately branded, very specific AI tasks. Most of these features are also offered by Amazon and Google, but as part of broader APIs. As you can see in the chart, all three of the vendors offer essentially the same capabilities. Microsoft and Google do have a few unique offerings, though. For example, Azure Custom Decision Service helps personalize content and Google Cloud Talent Solution helps with the recruiting process.
Which General AI Offerings Should I Consider?
General-purpose machine learning offerings are used to train and deploy machine learning models. Since specialized AI services only cover a narrow subset of uses, such as image and language processing, you’ll need to use a general-purpose machine learning (ML) service for everything else. For example, many companies need product recommendation engines and fraud detection for their ecommerce sites. These applications require custom machine learning models.
- 12 common machine learning algorithms
- TensorFlow and MXNet pre-installed
- Can use other ML frameworks
Google Cloud ML Engine:
- Supports TensorFlow (as well as scikit-learn and XGBoost in beta)
Azure Machine Learning Workbench & Machine Learning Services:
- Supports Python-based machine learning frameworks, such as TensorFlow or PyTorch
Amazon SageMaker and Cloud ML Engine are purely cloud-based services, while Azure Machine Learning Workbench is a desktop application that uses cloud-based machine learning services.
Amazon SageMaker is described by AWS as a “fully managed, end to end machine learning service” that is designed to be a fast and easy way to add machine learning capabilities. In addition to the AWS Gluon machine learning library, SageMaker supports TensorFlow, MXNet, and many other machine learning frameworks. It was launched in November 2017 at the annual AWS re:Invent conference.
Google released its Cloud ML Engine in 2016, making it easier for developers with some machine learning experience to train models. Google created the popular open-source TensorFlow machine learning framework, which is currently the only framework that Cloud ML Engine supports (although it now offers beta support for scikit-learn and XGBoost). Both Amazon and Azure support TensorFlow and several other machine learning frameworks.
In addition to its older Machine Learning Studio, Azure has two separate machine learning services. The Experimentation Service is designed for model training and deployment, while the Model Management Service provides a registry of model versions and makes it possible to deploy trained models as Docker containerized services. Machine Learning Workbench is a desktop-based frontend for these two services.
How is Hardware Impacted by Machine Learning Workloads?
- Machine learning workloads require greater processing power
- The amount of processing required could be expensive
- GPUs are the processor of choice for many ML workloads because they significantly reduce processing time
- Google and other companies are creating hardware that’s optimized for machine learning jobs
- To help people get started with AI, Amazon offers a camera that can run deep learning models
Hardware is an important consideration when it comes to machine learning workloads. Training a model to recognize a pattern or understand speech requires major parallel computing resources, which could take days on traditional CPU-based processors. In comparison, powerful graphics processing units (GPUs) are the processor of choice for many AI and machine learning workloads because they significantly reduce processing time.
AWS, Azure, and Google Cloud all support using either regular CPUs or GPUs to train models. Google has a unique offering with its Cloud TPUs (Tensor Processing Units). These chips are designed to speed up machine learning tasks. Not surprisingly, they work with TensorFlow. Many other companies are now racing to catch up with Google and release their own ML-optimized hardware.
Outside of processing, AWS has several unique offerings in the hardware category. Its AWS DeepLens wireless video camera can run deep learning models on what it sees and perform image recognition in real time. Amazon seems to be promoting client-side processing as an easy way to get started learning about machine learning.
Although not strictly hardware, the AWS Greengrass ML Inference service allows you to perform machine learning inference processing on your own hardware that’s AWS Greengrass-enabled. Better still, you can keep using the extensive GPU compute power in the cloud to train your machine learning models, then deploy the outcomes to your own devices running AWS Greengrass ML Inference. Running ML Inference locally reduces the amount of device data to be transmitted to the cloud, and therefore reduces costs and latency of results.
What are the Open Source Standards Machine Learning Platforms Use?
Each platform’s deep learning offerings and their positions on wider industry-level machine learning initiatives, open standards, and so forth are a good indication of what the future holds. Deep learning offerings, in particular, highlight how the space has achieved a balance between competition and cooperation among providers.
- Google created an open-sourced TensorFlow, which has become widely popular among machine learning enthusiasts. Despite its connection to Google, both Amazon and Microsoft support TensorFlow in their deep learning services as well.
- Amazon has thrown its support behind Apache MXNet, advocating it as the company’s weapon of choice for machine learning and actively promoting it both internally and externally. MXNet underpins several of its machine learning and AI services.
- Microsoft provides CNTK, otherwise known as the Microsoft Cognitive Toolkit, for deep learning at the commercial level.
- AWS and Microsoft have jointly created the Gluon specification, which is a higher-level abstraction for developing machine learning models. Gluon currently supports MXNet and will soon be extended to CNTK. The Gluon interface simplifies the development experience and is aimed at winning over new developers early in their machine learning journey.
- ONNX, the Open Neural Network Exchange from Facebook and Microsoft, is aimed at creating transferable machine learning models. With ONNX, you create your machine learning model in an open format that allows it to then be trained on supported machine learning frameworks. ONNX has the support of both AWS and Microsoft, but Google has yet to come on board.
Benefits of Machine Learning in the Cloud: Conclusion
Since Azure, Google Cloud, and AWS all provide good general-purpose and specialized machine learning services, you will probably want to choose the platform that you’ve already chosen for your other cloud services. However, to avoid vendor lock-in when using a general-purpose service, you may want to use an open-source machine learning framework that is supported by all three vendors. At the moment, the framework with the broadest support is TensorFlow, although the field is changing rapidly, so we expect cross-platform support for more frameworks soon. The main holdout is Google, which previously supported only TensorFlow, but even Google is now introducing support for scikit-learn and XGBoost.
The Cloud Academy library includes machine learning courses for all three platforms, most of which contain examples using TensorFlow or scikit-learn. Some of the learning paths on this subject include:
- Introduction to Machine Learning on AWS
- Applying Machine Learning and AI Services on AWS
- Introduction to Azure Machine Learning
- Machine Learning on Google Cloud Platform
- The AWS and Azure learning paths also include hands-on labs so you can practice your skills.
We’re regularly adding new machine learning content to our library, based on what our customers need, so try the learning paths above and then let us know what else you would like to see. Good luck with your machine learning efforts!
Written by Guy Hummel and Jeremy Cook
New Lab Challenges: Push Your Skills to the Next Level
Build hands-on experience using real accounts on AWS, Azure, Google Cloud Platform, and more Meaningful cloud skills require more than book knowledge. Hands-on experience is required to translate knowledge into real-world results. We see this time and time again in studies about how pe...
New on Cloud Academy: AWS Solution Architect Lab Challenge, Azure Hands-on Labs, Foundation Certificate in Cyber Security, and Much More
Now that Thanksgiving is over and the craziness of Black Friday has died down, it's now time for the busiest season of the year. Whether you're a last-minute shopper or you already have your shopping done, the holidays bring so much more excitement than any other time of year. Since our...
Understanding Enterprise Cloud Migration
What is enterprise cloud migration? Cloud migration is about moving your data, applications, and even infrastructure from your on-premises computers or infrastructure to a virtual pool of on-demand, shared resources that offer compute, storage, and network services at scale. Why d...
6 Reasons Why You Should Get an AWS Certification This Year
In the past decade, the rise of cloud computing has been undeniable. Businesses of all sizes are moving their infrastructure and applications to the cloud. This is partly because the cloud allows businesses and their employees to access important information from just about anywhere. ...
AWS Regions and Availability Zones: The Simplest Explanation You Will Ever Find Around
The basics of AWS Regions and Availability Zones We’re going to treat this article as a sort of AWS 101 — it’ll be a quick primer on AWS Regions and Availability Zones that will be useful for understanding the basics of how AWS infrastructure is organized. We’ll define each section,...
Application Load Balancer vs. Classic Load Balancer
What is an Elastic Load Balancer? This post covers basics of what an Elastic Load Balancer is, and two of its examples: Application Load Balancers and Classic Load Balancers. For additional information — including a comparison that explains Network Load Balancers — check out our post o...
Advantages and Disadvantages of Microservices Architecture
What are microservices? Let's start our discussion by setting a foundation of what microservices are. Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs). ...
Kubernetes Services: AWS vs. Azure vs. Google Cloud
Kubernetes is a popular open-source container orchestration platform that allows us to deploy and manage multi-container applications at scale. Businesses are rapidly adopting this revolutionary technology to modernize their applications. Cloud service providers — such as Amazon Web Ser...
AWS Internet of Things (IoT): The 3 Services You Need to Know
The Internet of Things (IoT) embeds technology into any physical thing to enable never-before-seen levels of connectivity. IoT is revolutionizing industries and creating many new market opportunities. Cloud services play an important role in enabling deployment of IoT solutions that min...
Which Certifications Should I Get?
As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...
How to Go Serverless Like a Pro
So, no servers? Yeah, I checked and there are definitely no servers. Well...the cloud service providers do need servers to host and run the code, but we don’t have to worry about it. Which operating system to use, how and when to run the instances, the scalability, and all the arch...
AWS Security: Bastion Hosts, NAT instances and VPC Peering
Effective security requires close control over your data and resources. Bastion hosts, NAT instances, and VPC peering can help you secure your AWS infrastructure. Welcome to part four of my AWS Security overview. In part three, we looked at network security at the subnet level. This ti...