Start course
1h 2m

Move on from what you learned from studying the principles of recurrent neural networks, and how they can solve problems involving sequencing, with this cohesive course on Improving Performace. Learn to improve the performance of your neural networks by starting with learning curves that allow you to answer the right questions. This could be needing more data or, even, building a better model to improve your performance.

Further into the course, you will explore the fundamentals around bash normalization, drop-out, and regularization.

This course also touches on data augmentation and its ability to allow you to build new data from your starting training data, culminating in hyper-parameter optimization. This is a tool to that aids in helping you to decide how to tune the external parameters of your network.

This course is made up of 13 lectures and three accompanying exercises. This Cloud Academy course is in collaboration with Catalit.

 Learning Objectives

  • Learn how to improve the performance of your neural networks.
  • Learn the skills necessary to make executive decisions when working with neural networks.

Intended Audience


Hey guys, welcome back. In this video, we will apply what we've learned about embedding layers to a real world problem. Specifically, we want to be able to predict the sentiment of movie reviews. There is a very famous data set of movie reviews, is the IMDB dataset, and the guys at Keras have already preloaded it in their datasets. We will download it, and, and we will train a classifier that takes a review as input and predicts whether the review was a positive review or a negative review. The first thing we're gonna look at is the function that we use to load the data. This is imdb.load_data, it stores it in a temporary location, and we are also passing a bunch of parameters, in particular the one parameter, the two parameters you want to pay attention to are the num_words which we've set to None, which means load all the words in the reviews, and the maximum length, which we also set to None. Right now, we've loaded all the possible reviews, but you can also trim the data to only passing a certain amount of most common words for example 10,000 or 20,000 most common words in the reviews, and also you can pass a maximum length for the reviews.

Let's have a look at the shape of the data. We have 25,000 reviews on, in the training set. If we look at one, it has, it is a long list of numbers. The reason it's a list of numbers is that they've already been converted with a dictionary into an encoding, so the list is the review, character number one is the starting character, so that's, each list is always going to start with the character number one, with the word number one. And then we have two other special character and the character number two is used for words that are out of the dictionary, and character number three is the starting index which means the dictionary provided from the IMDB website will need to be shifted by three. Let's have a look at the index. The index, we can also get from the get_word_index, and it's a dictionary where to each word corresponds a number. As you can see, numbers go as high as 60,000 or more so we can take the values of the index and then take the max to know how many.

 There are 88,584 words in the index. Let's create the reverse index where we swap the keys and the values in the index items, so for key and value in the index, we create a new dictionary where the key is the number and we also add three so that we actually are able to retrieve the correct words, and the key becomes the value, so what do I mean by this? I mean that reverse index is the reverse of the previous index. The number and the word. And again, if you check, again before, fawn corresponded to 34,701, and now the same word is the value for 34,704. We also add to our reverse index, the zero, which is the padding character, the one, which is the start character, and then the two and the three which are the special characters. Once we've done this, we can convert a review as it's given, for example in X_train zero, for each word in X_train zero, the first review, we take the reverse index of the word and then we join all these words by spaces. So we can get a review. Start_char is gonna be the beginning, and then this film was just brilliant casting location scenery story direction everyone's really suited the part they played. 

As you can see, punctuation's been removed, it's been rendered to lower case, but it's still an understandable review. Robert Redford's an amazing actor and now the same being director Norman's father came to the same Scottish island and myself so I love the fact, blah blah blah. We have 218 words in this review. Let's check a few others. The first review is 218, the second review in the training set is 189, and the third review is 141, and let's check the last one is 550. Okay, so it's very clear that our reviews have different lengths, and we will need to pad these to equal length. Luckily, Keras gives us a pre-processing function to pad our sequences with zeros. We're going to set a maximum length of 100, 100 words, and then we're going to pad both the sequences in X_train and in X_test with maximum length of length. 

We do that and now X_train_pad is an array with 100 words per review and 25,000 reviews. So to see the effect of padding, we can print out the first padded review, it's in array now of 100 numbers, and we can also print out the original first review which was a list. Let's see. The beginning of the list is one, 14, 22, which is not the same as the beginning of this review. Whereas the end of the original review is 19, 178, 32, corresponds to the end of our array. What pad_sequences did was to truncate reviews longer than 100 numbers by cutting away the beginning of the review. This is called pre-padding or pre-truncation if the review was shorter than 100 numbers, we would have added zeros at the beginning. You can see in the pad sequences, we have options of both padding and truncating to be pre or post by default, and the value of padding is zero. Perfect, so now that we've truncated and padded our reviews, we are also going to check what the maximum number is of features.

 We take the maximum of x for both the padding, the X_train padding and the X_test padding, we add one, that's the maximum number of features we have available in the dictionary. The next step is to build the model and we've loaded a bunch of reviews from the IMDB dataset and our goal is to measure the sentiment of such reviews. We will embed our 88,000 words into a 128 vectors feature space and then we will pass these vectors to an LSTM. So the LSTM will take the sequence of words and encode it with little dropout and little recurrent dropout to return a binary label with a sigmoid, which will tell us if the sentiment is positive or negative.

 In fact, we should've looked at this before, but y_train is a long sequence of ones and zeros, where one is positive sentiment and zero is negative sentiment. We build our model, we compile it, and then we train it. We train it for two epochs, with a batch size of 32. Notice that this model is quite complex and so it will take a long time to train on a laptop. So feel free to run it on a GPU is you prefer and I'll see you when the model is trained. Okay, so our training finished running, and we reached a validation score, validation accuracy of 83% on the training set. Let's check on the test set. This will also take some time. Our test score on the test set is 83.6% which is actually not bad on these dataset in just two epochs. I hope this inspired you to go out and do more text classification on interesting datasets. This is the end of section nine, see you in the exercises and thank you for watching.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.