Welcome to Part Two of an introduction to using Artificial Intelligence and Machine Learning. As we mentioned in part one, this course starts at the ground up and focuses on giving students the tools and materials they need to navigate the topic. There are several labs directly tied to this learning path, which will provide hands-on experience to supplement the academic knowledge provided in the lectures.
In part one we looked at how you can use out-of-the-box machine learning models to meet your needs. In this course, we are going to build on that and look at how you can add your own functionality to these pre-canned models. We look at ML training concepts, release processes, and how ML services are used in a commercial setting. Finally, we take a look at a case study so that you get a feel for how these concepts play out in the real world.
For any feedback relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
By the end of this course, you'll hopefully understand how to take more advanced courses and even a springboard into handling complex tasks in your day-to-day job, whether it be a professional, student, or hobbyist environment.
Intended Audience
This course is a multi-part series ideal for those who are interested in understanding machine learning from a 101 perspective; starting from a very basic level and ramping up over time. If you already understand concepts such as how to train and inference a model, you may wish to skip ahead to part two or a more advanced learning path.
Prerequisites
It helps if you have a light data engineering or developer background as several parts of this class, particularly the labs, involve hands-on work and manipulating basic data structures and scripts. The labs all have highly detailed notes to help novice users understand them but you will be able to more easily expand at your own pace with a good baseline understanding. As we explain the core concepts, there are some prerequisites for this course.
It is recommended that you have a basic familiarity with one of the cloud providers, especially AWS or GCP. Azure, Oracle, and other providers also have machine learning suites but these two are the focus for this class.
If you have an interest in completing the labs for hands-on work, Python is a helpful language to understand.
Now that you understand how a model is created, let's discuss some of the services that are designed to almost completely automate and aid you in the training of data. Assuming you bring out labeled training data, Google Cloud has some of the best, assisted training tools there are from a rudimentary user.
Now, at level three and level four tools like DataRobot, and Spark and Databricks start to become available, but at this level, where you have a lot of hands on help from the cloud provider, helping you learn, in my opinion, Google and Amazon has some of the best options.
Now, there are also options available on Azure and other commercial products, but let's focus here. The first one that often catches people's eye is AutoML Vision. Now it's important to understand that AutoML Vision has several modes that it's capable of running in. When you're creating your data and selecting it as the tool to run, know that the easiest to get started with is the concept of Whole Image. That's W-H-O-L-E, as in the entire image recognition machine learning.
That means the image as a unit is being looked at, and it's not looking for sub objects. This is useful in broad settings, such as saying is this a landscape shot or a city shot? But it's not as useful if you're attempting to identify specific objects in the image.
Another great application here is if you are using it in a controlled setting, where you are able to frame the subject matter, such as if you're going into your backyard and collecting leaves, you could always put them on your floor with nothing else in it, at which point a whole image classifier is acceptable.
That being said, another easy form to get started with, although not quite as easy is image recognition that does object detection. This is a little harder because the training data needs to subdivide the picture. Instead of simply saying it's the whole picture, you need to provide it x, y coordinates for a bounding box with the label of what's in that box.
Another example that gets thrown around a lot in this space, is training it to identify parts of a salad. For example, if you're at a restaurant and you get a garden salad, and you wanna take a picture of it and see how many tomatoes are in there, you would have to create object detection.
So to begin to do this, what you do is you take a picture of the salad, make a bounding box of where the tomato is, label it tomato, and take a few more pictures of the salad and continue to create bounding boxes. Now it's able to identify where tomato is. Another very important thing to understand when doing object detection is the closer your training set is to real raw data, the better their performance.
Now we're using a rather silly example of you're taking pictures of your salad in a restaurant, but maybe you're doing quality control in a factory. That makes prepackaged salads. So in this example, you would want to take pictures of the salad as it's going through the production line. This way the model normalizes itself for things such as your type of lettuce, it normalizes itself for things such as the packaging.
If you were to take pictures of the tomatoes in isolation, or maybe in a different type of salad, it would start to struggle and maybe not have as high of a performance because it wasn't trained in that scenario. Background noise such as the lettuce and the dressing, the closer you are to production, the more it will help. And to tie this back to what people might use in a more real world setting.
Just know the rule of thumb is, the closer your training set is, particularly when you're doing things like object detection and not Whole Image. The closer it is to reality, the better your results will be. And finally, one of the other very popular options is AutoML Translate. This is unlike the other ones. This is part of The NLP, Natural Language Processing suite.
Now all translation services nowadays typically use machine learning at some level under the surface. But AutoML Translate is really important if you're attempting to detect specific phrases or expressions in your industry. Now to use a broad example, expressions like pulling your leg or fish out of water, might not translate very well, unless you're familiar with North American idioms. And to put this in a more Cloud specific example, you might expect an engineer to say or you yourself might say, "Hey, I'm gonna go spin up a server on Amazon."
If this was simply translated without an understanding, a non English speaker would be very confused as to what a server is doing in a spinning state. You don't rotate servers like that, they might think, or they won't know what to think it will not translate cleanly. What this type of service allows you to do is turn on the ability to capture these phrases as they are used in your organization or your applications, so that those are translatable.
This is extremely popular in industries such as the music industry, where new expressions and new phrases are coming up. Maybe new slang or colloquialisms is coming up when describing a song. Or in entertainment, there's always a new buzzword or a new acronym to describe to actors dating. This is the service that can begin to make sense of that jumble of words.
Lectures
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.