Introduction to Machine Learning
Welcome to an introduction to using Artificial Intelligence and Machine Learning with a focus on Amazon Web services and the Google Cloud platform. This course is designed to be a gentle introduction, starting at the ground up and focusing on giving students the tools and materials they need to navigate the topic. It will also include the necessary skills around data engineering, cloud management and even some systems engineering. There are several labs directly tied to this learning path, which will provide hands-on experience to supplement the academic knowledge provided in the lectures.
This course begins with a introduction to AI and ML, before moving onto explain the different levels of users in the field. Then we take a look at out-of-the-box solutions for AI and ML, before looking at a case study to give you the topics covered during this course in a real-world example.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
By the end of this course, you'll hopefully understand how to take more advanced courses and even a springboard into handling complex tasks in your day to day job, whether it be a professional, student, or hobbyist environment.
This course is a multi-part series ideal for those who are interested in understanding machine learning from a 101 perspective, and for those wanting to become data engineers. If you already understand concepts such as how to train and inference a model, you may wish to skip ahead to part two or a more advanced learning path.
It helps if you have a light data engineering or developer background as several parts of this class, particularly the labs, involve hands-on work and manipulating basic data structures and scripts. The labs all have highly detailed notes to help novice users understand them but you will be able to more easily expand at your own pace with a good baseline understanding. As we explain the core concepts, there are some prerequisites for this course.
It is recommended that you have a basic familiarity with one of the cloud providers, especially AWS or GCP. Azure, Oracle and other providers also have machine learning suites but these two are the focus for this class.
If you have an interested completing the labs for hands on work, Python is a helpful language to understand. Now, if you're looking into a career in machine learning, you can definitely do it with languages such as Java, C#, even lower level languages such a C++ or functional languages such as R or Matlab. However, in my experience, Python is the most widely adopted language specifically, if you're looking to go heavy duty into training, learning, and developing models,
Level One is the layer that most users become introduced to artificial intelligence. The key concept to understand in Level one before we could start to discuss what you can do with machine learning is to understand the relationship between Data, the Model and Results.
In this diagram, which starts on the left, data passes through the model and produces results. In this base level understanding data could really be anything. It could be simple text from a social media account, such as a Twitter message. It could be an image of something such as dogs and cats. Or maybe it's a more complex data structure such as layered geographic information.
The model is what the cloud or third party providers making available via a predefined interface in Level one. Later levels, you might have to write the interface or interact with a more complex interface but importantly, this is the segment that applies intelligence to the data. So, the data is processed by the model and results are produced.
There are some proper terms for this that we'll cover in Module two. But just know that data is processed by the model. The model provides insights, results, scoring, extraction, some type of enrichment, and then results are produced.
Now, metadata around the results accuracy is typically included but for a Level one understanding, you only need to understand if the model is accurate or not accurate. Typically, this is provided as a confidence score that you'll see briefly in the case study that we'll be reviewing. And now let's understand what types of services are typical for Level one.
Level one has lots and lots of options available to it despite being the most entry level layer. Named Entity Extraction is part of a feature set made available in Amazon's Comprehend service. Now comprehend as a whole as a service that is extremely useful in that allows you to analyze text.
Now potentially this text was previously run through a different model to convert speech to text but regardless, this type of model accepts text-based input. Now named entity extraction in particular allows you to extract information from the text such as companies, proper names of individuals, or even things like locations.
Typical uses of this type of service would be how many mentions a company has in social media so that you can give it a trend awareness score, or potentially if you work at a large financial institution, you might be required to monitor internal communication to detect what people and companies are being discussed by different trade organizations and groups within your business.
Translation is another service that many people provide but it's important to realize that it is a machine learning process as well. At a very rudimentary level, translation could be thought of as a rules engine of trading of words for other words, but anybody who's looked at a multilingual dictionary, knows that there's syntax content, and specific details that need to be understood. And by using machine learning, the cloud providers or the machine learning providers don't have to train every single specific scenario.
They're able to make these models be adaptable and expandable on their own and then they make those learnings available to you via the translation API. Rekognition and yes, it's spelt with a K instead of a C, is Amazon's image processing suite. This service covers both still pictures and videos and is capable of doing actions around labeling objects in an image.
This is an extremely useful set of features and that allows you to upload a picture of a bicyclist and it will be able to label it with things such as; bikes, cars and road. And furthermore, it is also able to pull text out of an image for optical character recognition which is sometimes written OCR.
Now, Amazon and some of the other providers also as an extension of this, allow you to do some base-level intelligence functions such as labeling a material as not safe for work. This is an API or SDK depending how you interact with it. It's extremely useful for applications such as monitoring user submitted content on a website or for anyone in a public facing role looking to screen a media presence.
Sentiment, is extremely popular part of the previously mentioned Comprehend service. We'll actually discuss this in-depth in a follow-up lab and in a case study. but at a high level know that this allows you to detect how a text and by extension the user that wrote the text feels about a specific subject. Advanced functionality even breaks down larger blocks of text into sub-blocks, and could attempt to identify how the user feels about multiple subjects if it's a larger block of text. You may have noticed that this is now combined both Named Entity Extraction with the Sentiment
Typically usage for this would be my measuring public opinion about a product or brand. And as I mentioned previously, I've worked with groups, particularly in the music industry, as they attempt to monitor how people feel about their new songs. Syntax is slightly different from the previous mentioned text-based Machine Learning Services despite being part of the greater Comprehend umbrella in that it extracts and understands the components of a sentence. This is extremely useful for detecting things such as user intent and can also be used for building things such as chatbots, or complex parsers.
Finally, the last machine learning model that I want to discuss is Trusted Advisor. Now, this is a little bit different, in that it is not an SDK or API that anybody can call. But anybody who's used Amazon Web Services from an infrastructure point of view has probably knowingly or unknowingly interacted with this. This is a machine learning model that uses the inputs such as CPU utilization, memory usage, and IOPS, from your cloud servers, and attempts to control the output of cost with the health score being lower cost.
Now, Amazon provides this as a service so that can stay cost competitive. But this is just another service that Level one users can see and start to influence and that you could log into the Trusted Advisor console that many people will have through the AWS services interface and see how it's being applied and interact with it there.
About the Author
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.