Introduction to Machine Learning
Welcome to an introduction to using Artificial Intelligence and Machine Learning with a focus on Amazon Web services and the Google Cloud platform. This course is designed to be a gentle introduction, starting at the ground up and focusing on giving students the tools and materials they need to navigate the topic. It will also include the necessary skills around data engineering, cloud management and even some systems engineering. There are several labs directly tied to this learning path, which will provide hands-on experience to supplement the academic knowledge provided in the lectures.
This course begins with a introduction to AI and ML, before moving onto explain the different levels of users in the field. Then we take a look at out-of-the-box solutions for AI and ML, before looking at a case study to give you the topics covered during this course in a real-world example.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
By the end of this course, you'll hopefully understand how to take more advanced courses and even a springboard into handling complex tasks in your day to day job, whether it be a professional, student, or hobbyist environment.
This course is a multi-part series ideal for those who are interested in understanding machine learning from a 101 perspective, and for those wanting to become data engineers. If you already understand concepts such as how to train and inference a model, you may wish to skip ahead to part two or a more advanced learning path.
It helps if you have a light data engineering or developer background as several parts of this class, particularly the labs, involve hands-on work and manipulating basic data structures and scripts. The labs all have highly detailed notes to help novice users understand them but you will be able to more easily expand at your own pace with a good baseline understanding. As we explain the core concepts, there are some prerequisites for this course.
It is recommended that you have a basic familiarity with one of the cloud providers, especially AWS or GCP. Azure, Oracle and other providers also have machine learning suites but these two are the focus for this class.
If you have an interested completing the labs for hands on work, Python is a helpful language to understand. Now, if you're looking into a career in machine learning, you can definitely do it with languages such as Java, C#, even lower level languages such a C++ or functional languages such as R or Matlab. However, in my experience, Python is the most widely adopted language specifically, if you're looking to go heavy duty into training, learning, and developing models,
In order to help illustrate how these concepts can be used, let's go through a case study. Imagine you were in charge of analyzing all the feedback your company has received for its online courses. There are thousands of responses, and your boss needs them by tomorrow, because this boss waited till the last minute to give you this task, how would you start to handle this? How would you even begin to read through thousands of responses, without the help of machine learning? What do you do?
You may have reached the conclusion that you do need machine learning. And in my opinion, it would be the correct course of action to start to try to leverage level one models. So when beginning to leverage level one models, typical questions you ask yourself is how do you fill out this diagram? Where's the data stored? What is the data? What model should we use? Who will provide that model? And how do we interface with it? And also, how do you present the results? The model output results, how do I make that meaningful so that my boss can read it?
So to dig in, and answer each of these with what a real world scenario would be like is that the data is stored in an object store. In this case, S3. Each survey response is its own file, which makes it extremely easy to parse. This is where the previously mentioned programming experience such as Python or Java comes in handy, because it allows you to manipulate incoming data and get it ready for the model.
Secondly, the model, since we are trying to determine how people feel, should probably be something around sentiment. Now, Google, Amazon and Azure offer out of the box sentiment models, but as the data is already in S3, it makes a lot of sense to stick with Amazon. Amazon's comprehend service has sentiment as part of it, it has a fully featured Python, Java, REST APIs that allow you to connect to it.
So it makes a lot of sense to leverage Amazon to keep it on a single cloud provider. And finally, presenting the results is a little bit of a user's choice and preference. In this case, Sagemaker Notebooks allow you to make some pretty graphics by doing inline displays. And that's also what the follow up lab in which we will be doing some hands on sentiment analysis, we'll leverage, but in practice, you can leverage most dashboarding tools, you can leverage PDF reports, or even something like a CSV or Excel export would probably be sufficient in this case.
For the purpose of this case study, we will assume that Sagemaker Notebooks are used to make pretty outputs and control the flow of information. Rather than show code on screen as it comes through an abstract number of S3 files, I simply want to show how comprehend looks in its GUI interface and how you can use some sentiment analysis.
So here you could see built in text in which some very positive feedback is coming through. And we're specifying that it wants to use the built in model. As you can see, it also enables custom labels. And as we said, level two certs will enable that, we'll look back on that later. And as an output of this model, the built in sentiment model, you could see that it is an extremely positive sentence. The sentence of course says they like the course a lot. And as you can see, we're extremely positive that it is a positive sentiment.
As we previously discussed, for level one simply understanding percent likelihood is sufficient for model fit. And as you can see, here, we are 99% sure, it is a positive result, More nebulous statements might have slightly less clear results, but this allows you to assign a level of confidence to when you declare a result as positive, negative or mixed or in some cases neutral. And finally, to come full circle, here is a sample request and response from the comprehend API, specifically the entities comprehend subtype, showing that you can submit text as shown on the left and get a response on the right in JSON format, which for the students here that have some level of development background you'll be familiar with this, is extremely easy to integrate into your existing applications.
So you can actually try a similar approach to this in the attached lab. As we mentioned, this learning path has several labs for hands on application. And basically, instead of having to deal with an S3 bucket, and different displays, we've simplified the flow into a collection of simple sentiment statements which can be run through the machine learning application.
This lab will help you understand and actually execute some sentiment analysis in a programmatic fashion, will also walk you through some of the basics of using the GUI. So at this point, you can either step over to the lab and try it out. Or you could stick around in the lectures and start to review what a level two user journey and application is.
As a refresher, next up is a discussion of a more detailed understanding of how models are created and trained, and what a more customized experience would look like.
About the Author
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.