This course explores the core concepts of machine learning, the models available, and how to train them. We’ll take a deeper look at what it means to train a machine learning model, as well as the data and methods required to do so. We’ll also provide an overview of the most common models you’re likely to encounter, and take a practical approach to understand when and how to use them to solve business problems.
In the second half of this course, you will be guided through a series of case studies that will show you how to apply the concepts covered in this course to real-life examples.
If you have any feedback relating to this course, feel free to contact us at support@cloudacademy.com.
Learning Objectives
- Understand the key concepts and models related to machine learning
- Learn how to use training data sets with machine learning models
- Learn how to choose the best machine learning model to suit your requirements
- Understand how machine learning concepts can be applied to real-world scenarios in property prices, health, animal classification, and marketing activites
Intended Audience
This course is intended for anyone who is:
- Interested in understanding machine learning models on a deeper level
- Looking to enrich their understanding of machine learning and how to use it to solve complex problems
- Looking to build a foundation for continued learning in the machine learning space and data science in general
Prerequisites
To get the most out of this course, you should have a general understanding of data concepts as well as some familiarity with cloud providers and their managed services, especially Amazon or Google. Some experience in data or development is preferable but not essential.
For our final case study, let's look at a very common marketing problem. How do we market effectively to our potential customers? Typically, this is done through what is called segmenting the market, basically defining archetypes and profiles of consumers and then speaking to them with a targeted advertisement.
This is actually why companies such as Google and Facebook make so much money because they're able to allow the machine learning algorithms to more accurately target people based on what they like, know, and love. And although that can bring up a privacy concern, it makes it really straightforward for machine learning algorithms to narrow in on what somebody likes.
For this example, let's consider a music streaming service. This service has thousands of subscribers, and it has already classified its own content into genres. Now, every username we have some basic listening habits already recorded such as number of hours of music streamed and their favorite genre.
Additionally, for every one of our users, we have some extra information such as gender, age, education level, marital status, and number of children. We also have income level. And although this is a numeric value type, what we've done is we've created brackets so that this becomes a categorical data field of which category does this user belong in.
Now, consider this problem for a second. There's actually no labels. We don't know what the categories of customers are. In this case, we have what's called unsupervised learning. Now, this is where I quickly can go down the rabbit hole of deep learning. But at this point, we're gonna simply say, we're looking for clusters of customers based on similarity.
So simply here we're saying, take these customers and figure out which ones have attributes in common and group them together and that we want a certain number of clusters potentially would be another restriction we could put on it at this point.
So with all of this in mind, K-means clustering is an algorithm that makes a lot of sense for unsupervised machine learning. You may recognize that there's a hyperparameter again, K. In this case, the hyperparameter simply dictates the number of clusters we should try to create.
his is a really good example to start to learn about hyperparameter tuning where you could see how two clusters, three clusters, five clusters, a hundred clusters, fundamentally changes the output of the model and determining which is the best number of clusters starts to fall into model fit for the problem, which is a little beyond the scope of this class. But, your end result will look something like this.
Although it's a multi-axis problem, in this example, we've simplified it to just be two axes, but here you could see the machine clearly making three clusters for our target markets. You can see some of the dots that kind of float between them in the middle of the page. This can make it hard for a human to determine, but the machine learning algorithms have a programmatic way of assigning points to a certain cluster.
Very importantly too, although we're only showing a two-dimensional problem, clustering algorithms can handle multidimensional problems going into the dozens or hundreds, depending how powerful your machine dimensions.
Simply put, you're not limited by our ability to see it, but rather the computational power of your machine. So here we could see through giving it a hyperparameter of three, it created three clusters, one of which we dubbed young adults. Basically, we found a group of listeners that listen to more than 20 hours per week are single and are between the ages of 18 to 24.
Additionally, we found another strong cluster of young families. These people typically have an age of 25 to 34, listen to less than 10 hours per week, and have one to two children. Finally, we found a third group out in the age 55 to 64.
Basically, this group is married. They listen 10 to 20 hours per week. And frankly, in the marketing space, this is sometimes called empty nesters because their child usually went away to college or started their first job and they have more hours to listen to music now.
So this is how you could see three categories being made. Now, you might think that if we set hyperparameter to five, we might get more detailed results, which might be true in this case, maybe there's a group of families that have teenage children, or maybe through setting the hyperparameter too high, we start to split apart a group that has a lot in common.
This is where experience and more advanced systems come into play and also the iterative process. Based on this, we're then able to go out test our marketing campaigns, change our hyperparameters and try again.
So hopefully, this has shown you how an unsupervised learning problem might begin to take effect in the real world. So to wrap it all together, there's a lot of different machine learning archetypes. Hopefully, the flow charts we've shown you along with some of the examples really help you understand at least where to start your research into building your machine learning framework.
Amazon SageMaker has a lot of good options, but don't forget to check out Google and Azure's offering as well. Anyway, thank you for attending the class. Please send feedback to the email address below and feel free to rate it as well. Feedback is really important for us to help guide the future of content. A lot of the case studies and examples are actually requests for feedback where we try to tailor the classes more to what the listener actually wants to see.
Lectures
Course Introduction - Explaining Concepts - Models - Understanding Training Data Sets - How to Choose? - Case Study: Home Prices - Case Study: Heart Disease - Case Study: Animal Classification
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.