Case Study: Home Prices
Machine Learning Concepts & Models
The course is part of these learning paths
This course explores the core concepts of machine learning, the models available, and how to train them. We’ll take a deeper look at what it means to train a machine learning model, as well as the data and methods required to do so. We’ll also provide an overview of the most common models you’re likely to encounter, and take a practical approach to understand when and how to use them to solve business problems.
In the second half of this course, you will be guided through a series of case studies that will show you how to apply the concepts covered in this course to real-life examples.
If you have any feedback relating to this course, feel free to contact us at firstname.lastname@example.org.
- Understand the key concepts and models related to machine learning
- Learn how to use training data sets with machine learning models
- Learn how to choose the best machine learning model to suit your requirements
- Understand how machine learning concepts can be applied to real-world scenarios in property prices, health, animal classification, and marketing activites
This course is intended for anyone who is:
- Interested in understanding machine learning models on a deeper level
- Looking to enrich their understanding of machine learning and how to use it to solve complex problems
- Looking to build a foundation for continued learning in the machine learning space and data science in general
To get the most out of this course, you should have a general understanding of data concepts as well as some familiarity with cloud providers and their managed services, especially Amazon or Google. Some experience in data or development is preferable but not essential.
As our first case study, let's look at a pretty common one such as attempting to predict housing prices for any given neighborhood. This has been done on lots of real estate prices. And I'm sure if you type in a property address on any of the major sales channels in your region, you'll see an estimate of what the property is worth, even if it's not for sale.
So to define the problem, to put it very succinctly, what does a house cost for a given neighborhood? Fortunately in real estate, lots of information is already available to us. That's what makes this case study in particular a great one to get started on if you're looking for a small machine learning project to try on your own. Also, remember to check out the labs associated with this class.
But in this example, we have a clear set of features and labels. Features, we might think are things such as number of bedrooms, number of bathrooms, and square feet, and the label is price in this example because the business problem is, how much does a house cost?
Now remember for many problems, we might be able to run the model in reverse. What I mean by that is we could switch up what the labels and features are. Imagine we wanted to estimate how big a house was. We might be able to use features such as number of bedrooms, number of bathrooms and price and have square feet be the label. But since our business problem is how much does the price cost, we are using the price as the label and the other pieces of data as our features.
Next, we should reference the flow chart to think about what type of model should we be using. So firstly, we do understand the business problem. And secondly, we have labeled data. This clearly directs us towards a supervised learning problem because we have labeled training data.
Additionally, because our target is a numeric value, this type of problem lends itself really well to regression. Now, this could be linear regression, nonlinear regression, or any of a variety of types of regression, but it allows us to narrow down and say home prices can be predicted with a regression-type machine learning model. And the final step of the process is choosing a particular algorithm to employ in order to model home prices.
Looking at the data that we previously defined, we have three pieces of input, number of bedrooms, number of bathrooms, and square footage of the house. These factors we theorize determine the price of the house, aka our label. This is where some industry-specific knowledge and data science experience could become helpful because now we have to pick within the regression-based models.
For the simplicity of this class, let's use a linear learner algorithm. This basically assumes a linear relationship between the features and the output. You might remember this from algebra class of y = mx + b. However, in this case, the linear learner algorithm, depending on the type you use, might assume an equation more similar to price = c1 times bedrooms + c2 times bathrooms + c3 times number of square feet.
These values of c1, 2, and 3 are called coefficients or relative weights. The machine learning equation will attempt to assign values to these coefficients in order to predict price. Number of bedrooms might have a multiple of 100,000 while number of bathrooms might only have a relative weight of 50,000 and number of square feet might have some fractional relationship.
Basically, the machine learning algorithm will start to build an understanding of what the coefficient should be for each factor. And it might even be something like zero, a coefficient of zero might mean that this factor plays no role in predicting the price.
Now, obviously in this extremely simplified example, we're skipping over key factors, such as location, garage, property size, but that's part of what the iterative process is. If you remember, once you build and evaluate a model, you can improve your understanding of the data. This shows you how to start with a complex issue, such as home prices. You could start with a relatively controlled data set and grow it as your data understanding also grows.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.