Start course
1h 2m

Move on from what you learned from studying the principles of recurrent neural networks, and how they can solve problems involving sequencing, with this cohesive course on Improving Performace. Learn to improve the performance of your neural networks by starting with learning curves that allow you to answer the right questions. This could be needing more data or, even, building a better model to improve your performance.

Further into the course, you will explore the fundamentals around bash normalization, drop-out, and regularization.

This course also touches on data augmentation and its ability to allow you to build new data from your starting training data, culminating in hyper-parameter optimization. This is a tool to that aids in helping you to decide how to tune the external parameters of your network.

This course is made up of 13 lectures and three accompanying exercises. This Cloud Academy course is in collaboration with Catalit.

 Learning Objectives

  • Learn how to improve the performance of your neural networks.
  • Learn the skills necessary to make executive decisions when working with neural networks.

Intended Audience


Hello, and welcome to this video on embeddings. In this video, we will discuss how to extract features from text and we will introduce embedding layers. We've previously mentioned that neural networks can work with text data, but we have not really explained how to go from a sequence of words to numerical features. The simplest way is to compile a dictionary of words and assign to each word a numerical index. In this example, word number four is the word 'of', while word number two is the word 'be'. Now that we have a sequence of numbers, we can feed it, for example, to a recurring neural net. However, since we have tens of thousands of words in our dictionary, the numbers appearing in the sequence will be very large, which is not a good start for a neural network. 

Besides, since the index in the dictionary can be arbitrarily chosen, there is no relation between two words within your bi-indices. We could binarise the index to a huge factor with as many entries as the words in the dictionary, setting zero everywhere and a single one at the index corresponding to that word. This approach will also present some problems, because of the huge size and the sparsity of the input vector. The solution to all these problems is what's called an embedding layer. An embedding layer takes the huge sparse vector that we generated with the vocabulary and fully connects it to a much smaller set of nodes, for example ten or a hundred, or even two. This has the effect of mapping each word onto a vector space with few co-ordinates, making the space of words pretty dense. 

Also, this method carries an additional value that it will encapsulate semantic meaning in the denser space. Since we are training the embedding layer at the same time as the classification layers, the embedding will try to map words with close meaning to vectors that are close in the embedding space. This will produce semantic clouds of vectors and allow for very interesting consequences. One such consequence is that we will be able to perform mathematical operations with words, because the distance between the related concepts will be preserved. For example, the vector going from the word 'man' to the word 'woman' will encode a change of gender, and so one can expect that translating the word 'king' by such a vector, one should end up near the vector representing the word 'queen'. In this video, we've introduced the embedding layer and showed how they reduced the hugest vocabulary space to a more manageable dense space with semantic meaning. We've also showed that the vectors in this space can be added and subtracted in an interesting way. Thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.