The course is part of this learning path
Move on from what you learned from studying the principles of recurrent neural networks, and how they can solve problems involving sequencing, with this cohesive course on Improving Performace. Learn to improve the performance of your neural networks by starting with learning curves that allow you to answer the right questions. This could be needing more data or, even, building a better model to improve your performance.
Further into the course, you will explore the fundamentals around bash normalization, drop-out, and regularization.
This course also touches on data augmentation and its ability to allow you to build new data from your starting training data, culminating in hyper-parameter optimization. This is a tool to that aids in helping you to decide how to tune the external parameters of your network.
This course is made up of 13 lectures and three accompanying exercises. This Cloud Academy course is in collaboration with Catalit.
- Learn how to improve the performance of your neural networks.
- Learn the skills necessary to make executive decisions when working with neural networks.
- It is recommended to complete the Introduction to Data and Machine Learning course before starting.
Hello, and welcome to this video on hyper-parameters. In this video we we will talk about hyper-parameters, and we will introduce hyper-parameter optimization. A deep learning model contains many hyper-parameters at various levels. Let's look at them in detail. When we define a network, we can choose the number of layers, the number of nodes at each layer, the layer type, and the activation function for each layer. These already are many hyper-parameters. Also, we can add regularization techniques and initializations to to the network, and we have many choices for each of these. If we use dropout layer, we also have additional hyper-parameters like the probability of dropping a node. If we perform data augmentation, we can choose which transformations to use, and those are also choices that can be treated as hyper-parameters. Finally, we have to choose the optimizer. And for the optimizer, we have to choose the learning rate and possibly other parameters that are involved in that specific optimizer.
Also, we need to decide which batch size we are going to pass to our training algorithm. Since we have all these possibilities, the question is how do we find the best combination of hyper-parameters? Turns out that choosing the correct hyper-parameters is very important in training large networks. So, we conduct what's called an experiment. An experiment sets the hyper-parameters, trains for a certain number of epochs, and then checks the scores on the training and the test set. What we normally do is to conduct many experiments in parallel with a master process called coordinating the choice of parameters, and worker processes that conduct the experiments, and report back the results to the master. The master can choose several strategies to decide which hyper-parameter combination to try next. Here are the three most common ones: random search, grid search, and Bayesian optimization. Let's start with grid search. Grid search works by assigning ranges to each parameter and then sampling regularly within the assigned range.
A better way to sample within given ranges is to do random samples because this will give a better description of the probability distribution of the relevant parameters with respect to the irrelevant parameters. Finally, a more intelligent way to search is to use Bayesian optimization. Bayesian optimization works by assuming a prior probability distribution for the score over the hyper-parameter space and then adopting such distribution each time a new score is obtained from a combination of hyper-parameters. This sounds complicated, but luckily for us there are several types of packages implementing this search strategy, so we will use one of the in the exercise. In conclusion, in this video we've introduced how to choose hyper-parameters to tune a network, and we've talked about three different strategies to do that choice: grid search, random search and Bayesian optimization. Thank you for watching, and see you in the next video.
About the Author
I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.