Introduction to Machine Learning
Welcome to an introduction to using Artificial Intelligence and Machine Learning with a focus on Amazon Web services and the Google Cloud platform. This course is designed to be a gentle introduction, starting at the ground up and focusing on giving students the tools and materials they need to navigate the topic. It will also include the necessary skills around data engineering, cloud management and even some systems engineering. There are several labs directly tied to this learning path, which will provide hands-on experience to supplement the academic knowledge provided in the lectures.
This course begins with a introduction to AI and ML, before moving onto explain the different levels of users in the field. Then we take a look at out-of-the-box solutions for AI and ML, before looking at a case study to give you the topics covered during this course in a real-world example.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
By the end of this course, you'll hopefully understand how to take more advanced courses and even a springboard into handling complex tasks in your day to day job, whether it be a professional, student, or hobbyist environment.
This course is a multi-part series ideal for those who are interested in understanding machine learning from a 101 perspective, and for those wanting to become data engineers. If you already understand concepts such as how to train and inference a model, you may wish to skip ahead to part two or a more advanced learning path.
It helps if you have a light data engineering or developer background as several parts of this class, particularly the labs, involve hands-on work and manipulating basic data structures and scripts. The labs all have highly detailed notes to help novice users understand them but you will be able to more easily expand at your own pace with a good baseline understanding. As we explain the core concepts, there are some prerequisites for this course.
It is recommended that you have a basic familiarity with one of the cloud providers, especially AWS or GCP. Azure, Oracle and other providers also have machine learning suites but these two are the focus for this class.
If you have an interested completing the labs for hands on work, Python is a helpful language to understand. Now, if you're looking into a career in machine learning, you can definitely do it with languages such as Java, C#, even lower level languages such a C++ or functional languages such as R or Matlab. However, in my experience, Python is the most widely adopted language specifically, if you're looking to go heavy duty into training, learning, and developing models,
Although, not an official system. I find it easy to think of artificial intelligence users and in turn their applications in roughly four levels. Level one is where users take advantage of pre-made models. These models can be used to enhance, or be applied to an existing application, and are accessible and can be interfaced with through their appropriate APIs or SDKs. They require an understanding of how to leverage the model, but none of the underlying machine learning complexities is involved in training or creating the model.
Level one users can expect to add advanced functionality to their applications with no more difficulty than would be required to implement a new API or library. Typically these APIs and SDKs have common languages such as Python or Java, or typically have a REST API as well.
Typically after using pre-made models in level one, users start to want to have more custom functionalities and identifiers, either the workplace has custom key phrases, keywords, or even brands that they wish to identify, or maybe they work at a manufacturing plant, and the user's application needs to be able to identify specific parts and components relative to that business.
Personally, I have worked with music publishers who wish to understand how their songs and new releases were doing on social media, and attempting to hone in on specific key phrases, and song titles were of utmost importance. This is where training your own model comes in. In recent years, there are many services online that allow you to simply upload training sets of data, and then allow you to choose custom identifiers within that data, and then train to that.
Compared to later levels such as level three and four, this is a relatively limited feature set. But what it does allow you to do is create applications that have a higher degree of complexity and sophistication in level one. And to start to customize it to your specific needs.
Level three is what begins to separate data scientists from developers and data engineers. Typically when people talk about taking a career in machine learning, this is where you start to become a professional data scientist. Typically, data engineers and developers are able to efficiently get through levels one and two, but level three begins to require knowledge about how models work at a highly detailed level, and knowledge about what specific primers go into tuning, and creating a model.
There's been an industry trend recently to build systems that can assist in the development of machine learning models. These tools take some of the cumbersome, confusing iteration, and tuning processes away from the user, but even then you need to understand concepts such as hyperparameter tuning, and it is mandatory to have a highly detailed set of knowledge in order to do that.
Mentioning Databricks is interesting too, because as I mentioned, there's been an industry trend recently to start to introduce more tools. These tools include things such as SageMaker notebooks from Amazon, DataRobot, which has recently received a ton of venture capital to help them build this type of application, and Databricks, which provides a nice, clean interface to Spark.
Oftentimes these companies will actually have detailed documentation, and examples of how to use the applications, which can greatly enhance your ability to get to level three, and even start to bridge the gap between level two and three. Level four is really the pinnacle of an AI user's journey. At this level, you completely understand how models work, and you're starting to even study how to make your own models from very low levels.
At this level, you might start writing detailed, graphical processing units, also known as GPU accelerated models, or start writing some customer TensorFlow code. Now, you might have some exposure to things such as TensorFlow at lower levels, but at level four, you're able to start incorporating truly, unique functionality into your models and applications.
At this level, you can start to expect to compete on sites such as Kaggle, which is a awesome website that allows people to make competitive models with each other, and start to create some cool functionalities that could mimic Google's deep brain, although maybe you don't have access to the same number of servers.
So you're simply mimicking that functionality at a smaller scale. On this learning path, though, we'll be focusing on teaching users, such as yourself, how to get started in level one and level two with potentially more modules being added later on.
In later levels, you need more comprehensive, longer learning paths, which are beyond the scope of a gentle introduction, but by the end of this learning path, you'll be able to leverage out of the box models, and begin to build your own custom functionality with level two applications.
About the Author
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.