Level 2: Release Process
The course is part of this learning path
Welcome to Part Two of an introduction to using Artificial Intelligence and Machine Learning. As we mentioned in part one, this course starts at the ground up and focuses on giving students the tools and materials they need to navigate the topic. There are several labs directly tied to this learning path, which will provide hands-on experience to supplement the academic knowledge provided in the lectures.
In part one we looked at how you can use out-of-the-box machine learning models to meet your needs. In this course, we are going to build on that and look at how you can add your own functionality to these pre-canned models. We look at ML training concepts, release processes, and how ML services are used in a commercial setting. Finally, we take a look at a case study so that you get a feel for how these concepts play out in the real world.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
By the end of this course, you'll hopefully understand how to take more advanced courses and even a springboard into handling complex tasks in your day-to-day job, whether it be a professional, student, or hobbyist environment.
This course is a multi-part series ideal for those who are interested in understanding machine learning from a 101 perspective; starting from a very basic level and ramping up over time. If you already understand concepts such as how to train and inference a model, you may wish to skip ahead to part two or a more advanced learning path.
It helps if you have a light data engineering or developer background as several parts of this class, particularly the labs, involve hands-on work and manipulating basic data structures and scripts. The labs all have highly detailed notes to help novice users understand them but you will be able to more easily expand at your own pace with a good baseline understanding. As we explain the core concepts, there are some prerequisites for this course.
It is recommended that you have a basic familiarity with one of the cloud providers, especially AWS or GCP. Azure, Oracle, and other providers also have machine learning suites but these two are the focus for this class.
If you have an interest in completing the labs for hands-on work, Python is a helpful language to understand.
To begin to go over how to create a model, let's start to graph the process of creating, training and releasing and using a model. So as we've just discussed, labeled that training data is used to train a model. This is so the model knows what responses to give with the sample inputs. However, it's also important to test the model. This why you should always save some of the data that is labeled for testing.
Basically, you wish to hold between a third and 20% of your data aside, so when you train it, you're then able to test it and compare it to an expected response. It is extraordinarily important and if you forget this you're gonna not have a good time, that you do not use the same data for testing and training.
As we mentioned previously, there's the concept of over-fitting, which is where the model simply learns a rule to respond the same way to the same phrase and you're at high risk of not creating a well designed model, but if the model is well designed, it's called well fitted or well developed and this means, the model has a high degree of accuracy and is able to accurately predict results.
Now, there are a lot of ways of measuring the fit, such as R-squared, false positive, false negative rate, things such as a confusion matrix. That'll be discovered and discussed in level three or maybe in addendum to this current module, but for now just know, that if a model is given a fit score, the higher the fit, the better it was able to predict the responses with the test data.
Once you have a model that's well trained, it's important to discuss how to get this into production or application usage. We're actually at an extremely good point right now, because what we have is a model with-- at level two they typically come with a good interface built into the platform and at this point, we actually only really need a level one user to begin taking advantage of the trained model.
The key here is that a trained model is designed and developed by data scientists before being pipe-lined or automated into a production or usable environment. The model should actually be considered a versionable, deployable object that can be released into Get or whatever version control your company is currently using. Very important to reiterate this one, a well designed model should go through an existing code release process if your a professional and you do not need a new style of code release to promote a model into production.
In fact, it's probably best if you fall within existing norms, so you're not re-engineering your entire release process. As you grow beyond level two and even in level two, you're able to take advantages of automation suites such as Terraform and Ansible.
In the lab you'll see an example of how to train a model, pick up where it lives on the Cloud and then import that into an existing code base. And then once it's in production, once you've pipe-lined it, you're basically back at level one where somebody simply needs to understand that that model can be used to process data and push results out.
The key here is that model development of training in this iterative detailed process is different than inferencing. Very important there, because what that means is your production application continue to use the model where your data scientists built it and you don't necessarily need data scientists on the production team, if your developers are capable of taking advantage of the model from a level one style perspective.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.