Andrea Cappelli

Roberto Turrin

March 5, 2020

How We Use Natural Language Processing to Scale & Automate Quiz Creation

What is Natural Language Processing?

Natural Language Processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence. NLP is focused on the interactions between computers and natural human languages, particularly how to program computers to process and analyze large amounts of natural language data.

How does Cloud Academy use NLP?

Cloud Academy’s mission is to help people improve and track their skills. With this objective in mind, training people with online-based learning content is just a part of the whole picture aimed at delivering quality content and providing useful insights both to learners and to their managers.

The Cloud Academy training platform is comprised of many key features, including:

Courses to build tech skills on industry-leading technologies with video-based lectures
Hands-on labs to learn in live cloud environments using step-by-step procedures
Lab challenges to demonstrate problem-solving skills using our sandboxed accounts on AWS, Azure, and Google Cloud Platform
Exams and quizzes (skill assessments) where knowledge progress is tracked to provide useful insights both to learners and to their managers

Such products are based on cutting-edge technologies that are continuously experimented to drive new features, some of which are later prototyped before being eventually integrated into the entire platform, activity led by a dedicated team named Inspire.

Among the others, one of the goals of the technology developed by Inspire is to automate and scale the management of content with the use of Artificial Intelligence (AI). Indeed, managing content is known to be a demanding and time-consuming activity, typically manual and fully in charge of a team of Subject Matter Experts (SMEs). The technology is meant for simplifying and automating the management of content in order to let the SMEs focus on the most interesting activities where their expertise cannot be replaced by a machine learning algorithm.

In this article, we will present an example of technology developed internally and applied to our platform in order to scale the configuration of the “so-called” knowledge-check quizzes and help users verify whether they acquired the skills taught in a course they just completed. When you’re ready to try out a knowledge-check quiz, check out some of Cloud Academy’s most-popular on AWS, Azure, or Google Cloud Platform.

Knowledge-check quizzes

Knowledge-check quizzes are short assessments composed of three to five multiple-choice questions (MCQs). The quizzes are placed at the end of a training session after you complete a course, allowing you to verify if you learned the basic concepts trained in the session. Based on the answers you submitted, our Skill Assessment technology adapts your score on the tested skills to reflect the actual level of the knowledge and visualizes your results.

Configuring such quizzes requires a SME to browse the catalog of questions, select the most appropriate to test the knowledge acquired in the course, and finally assemble a quiz and attach it to the course. In order not to propose out-of-scope questions, the SME must be aware of the exact content of the course. This task is reasonably straightforward if the SME is the author of the course, but much more demanding if the course was prepared by another SME, such an external contractor. Also, selecting the right questions is a demanding task; again, relatively simple if the questions were created by the same author as the course, much harder if questions were authored by someone else.

Automating knowledge-check quizzes

To simplify such a process, we have implemented an AI pipeline based on Natural Language Processing (NLP) that:

Parses the content of the courses. In particular, the transcript of our video lectures, which are obtained using an external transcription service.
Identifies the core concepts in the text.
Selects the questions that are very relevant to such concepts.
Assembles a knowledge-check quiz with a selection of such questions.
Allows the quiz to be reviewed by a SME, who can approve/reject any single MCQ proposed by the NLP-based algorithm. This is the only phase where an action of the SME is required.
If approved by the SME, attaches the quiz to the course and publishes it at the end of a course.

Natural Language Processing (NLP) pipeline

The NLP components have been implemented using standard Python libraries such as Nltk, Spacy, and Gensim. To give you some additional details:

In phases 1 and 2, key concepts are identified using statistical analysis on terms appearing in lecture transcripts, focusing on words and phrases that seem more tightly related to the topic at hand so that they can later be recognized inside new text. Concretely: we look for phrases like “S3 bucket” or “neural network” and learn how to detect them inside new text samples.
In phase 3, each relevant lecture for the knowledge-check we want to create is separated into text chunks, each centered on a specific sentence surrounded with meaningful context,
e.g.: “Convolutional neural networks are one kind of machine learning algorithm. They’re often used for computer vision tasks. For instance, they can be used for object detection applications.”
Then, information retrieval techniques are used to compare these paragraphs to questions available in our content base, ranking the best matches higher. The context helps to find matches for facts expressed using several sentences (quite common in normal language). Consider, for example, a question like: “Which among these machine learning algorithms are often used for computer vision applications?” whose correct answer is “Convolutional neural networks.”
Checks are made on the key phrases mentioned in the question to make sure it doesn’t deal with concepts the lecture doesn’t explain. Finally, questions that performed the best against this scoring stage are finally assigned as viable review questions for the lecture.
With phase 4, a knowledge-check for a course is assembled, based on questions that are best suited the content of its lectures.

Deploy

The NLP pipeline was deployed in production on AWS, specifically spinning up an EC2 spot instance devoted to retrieving input content data, processing it to assemble the quizzes, saving the results on the Cloud Academy platform to make them available 1) to reviewers for accepting/rejecting them 2) and finally to users. The EC2 instance lifecycle was regulated by a Lambda function, which could be used to either run the pipeline on-demand or schedule it.

User interface

We gradually rolled out knowledge-check quizzes with an initial prototype on August 1, 2019, to validate their effectiveness. During an experimental phase, we processed the three most popular courses and generated their related knowledge-check quizzes, reviewed and approved by one of our SME.

We did not apply major changes to the user interface, but we simply placed the quiz within the list of recommended content that was already shown at the end of the course, replacing the suggested courses with a quiz. You can see our UI evolution below.

After-course content stripe UI evolution

During the preliminary validation (prototype), we measured that about every 1 in 3 users that were presented with the quiz at the end of a course actually started and completed it. Although not spectacular, this rate already was an improvement over the original recommendation stripe it replaced (1 in 20 success), and it encouraged us to move forward on the same path.

Once the prototype was validated, we involved our design team to improve the UI and UX, with the main goal of making knowledge-check quizzes more integrated with the product and perceived as the natural follow-up of the course. For such reason, when users complete the course, the UI immediately shows one question and, if users answer it, they will continue the quiz as usual.

To understand the success of knowledge-check quizzes over time, the chart below plots the percentage of completed courses (among the ones already provided with a knowledge-check) that were followed by a completed knowledge-check. The chart plots an example of the success achieved over time.

Success rate of Knowledge-Check quizzes over time — Note: Absolute values are hidden for non-disclosure reasons.

This simple flow change that has been live since Nov. 19, 2019, seems to have significantly helped. The success rate went from 33% to around 50%, with the added confidence due to more deployed knowledge-check quizzes, and correspondingly more available data.

At the same time, we extended the knowledge-check quizzes to further courses, currently covering 12 courses among the most popular. From the first day the knowledge-check quizzes were released, they have been constantly consumed by users that completed the related course, with no particular differences between courses. The increases in consumption in January and February were due to new courses being equipped with a knowledge-check quiz.

For full transparency, the peak in the first week of February, when we released the last bulk of courses with a knowledge-check quiz, was in correspondence of a weekend where, for 72 hours, we gave free, unlimited access to our content to everybody; however, looking at the two following weeks, we can observe how the measured consumption is confirmed to be higher than before.

The image below shows the total weekly activity on quizzes, highlighting the percentage of knowledge-check quizzes with respect to all quizzes. Again, the increases in January and February are mainly related to the introduction of new courses provided with a knowledge-check quiz.

We can observe that as the percentage of knowledge-check quizzes increased over time (from 20% to about 40%), the total number of consumed quizzes increased as well (more than doubling). Hence, we can conclude that Knowledge-Check Quizzes have not absorbed the activity on other quizzes, increasing instead the total number of consumed quizzes.

In addition, in a separate analysis focused on active users (i.e., users with at least one-hour activity in a month), we observed an important increase in the percentage of such users that took at least a quiz (a 3x increase from July to February). This suggests that knowledge-check quizzes appeal to users that would not have taken any quizzes before; therefore, it reaches out to a wider user base.

Conclusions

In this article, we briefly described one automated content construction approach we employed to scale our content, specifically to accelerate the association of on-topic knowledge-check quizzes to courses in our platform, so that users can sanity-check their understanding. The analysis of consumption metrics confirmed a good reception of the feature, with success rates as good as 50%, while engaging previously quiz-inactive users.

The scaling advantages come from a more efficient quiz creation process. Previously, when creating a similar knowledge-check quiz, our SMEs needed to either cross-reference our entire question knowledge base with the target course or, if this proved to be too cumbersome, just create new questions from scratch (still quite a time-consuming and inefficient use of potentially valid existing questions). The automated approach now performs the cross-referencing part and allows the SME to just quickly review the proposed quizzes.

To estimate time savings, we compared the old, fully manual approach (that required at least one hour or more of time) versus the new approach that required only 15-30 minutes of time. One interesting note on the objective we optimized against and the related tradeoff: While it’s easy for SMEs to quickly verify that a proposed question is appropriate (just one lecture passage mentioning the answer is enough), the most time-consuming operation is verifying it is not appropriate for a course. As a consequence, we aimed at a high-precision algorithm (i.e., propose a question only when quite confident) instead of a high-recall one, not to lose the time efficiency advantage.