Skip to main content

AWS re:Invent 2015: real-world smart applications with Amazon Machine Learning

How to apply Machine Learning to social media to make your customers happy

At his AWS re:Invent presentation, Alex Ingerman – technical product manager at AWS – went through the design and implementation of a real-world end-to-end application to transform a high-volume social stream into actionable support requests for your customer support team.
The idea is that social networks like Twitter generate a huge amount of conversations, but only a very small percentage are actually useful and require some follow-up action. By means of smart automation, we can easily and efficiently filter out all the noise (i.e. about 80% of the stream), with the ultimate goal of making our customers happy.
Amazon Machine Learning App Pipeline
The proposed software pipeline makes use of five AWS services: Amazon Machine Learning, Kinesis, Lambda, SNS, and Mechanical Turk.
Amazon Machine Learning will be used to classify tweets into either actionable or not actionable requests by means of a binary classifier, while Mechanical Turk will be used to label our tweets quickly and cheaply. The remaining services will provide the whole system’s logic, which can be abstracted into a more general framework to handle many more use cases and scenarios.

How do you train your Machine Learning model?

In order to build a useful classification model we need a labelled dataset of tweets. We can easily retrieve huge numbers of tweets using the official Twitter APIs, although we would still need to manually label them. This is where AWS Mechanical Turk can help us automate the labelling process and create a custom MT task. A few thousand tweets can be classified within hours, and at a fraction of the cost.
Once we have a real dataset, we can split it into a training and a testing set (i.e. two DataSource objects on AmazonML). These objects will provide us with interesting statistics about our dataset, and we can dig a bit deeper before proceeding with creating our model. For example, we may discover that removing duplicated tweets and retweets from our dataset would result in a cheaper labelling phase and faster training.
As part of the presentation, Alex showed us how to create DataSource objects and how to create a new model using Python (with the boto3 library).
As soon as the model is trained, we can enable it for real time predictions and, based on the business impact of eventual errors, incrementally adjust its behaviour.
The basic assumption is that every predictive model makes mistakes and therefore we always have to find the best trade-off between false positives and false negatives, based on our business goals and the cost of a mistake. In this specific case, false negatives are much worse than false positives, since ignoring a real problem is much more costly than manually handling a non-actionable tweet.

AWS Lambda and Kinesis for streaming data processing

We can send our stream of social activities to AWS Kinesis and connect it to AWS Lambda, which will take care of the core application logic. Basically, Lambda will interact with our Machine Learning model to understand if a tweet is actionable or not, and then either discard it or forward it to a customer support agent via SNS (via email, for example).
Our model will also return a confidence measure, allowing us to take more advanced actions based on its value. We could also add more complexity by labelling our dataset with more granular data, for example about the request type (i.e. “technical question”, “support request”, “problem report”, “feature request”) and training a multi-class model to send each tweet to the right support team.

Conclusions

This application can be seen as a general pattern to tackle many different problems, whenever you have a streaming data source, a per-instance processing task, a classification model, and a notification output.
Amazon Machine Learning general approach
The full source code of this application is available here, while you can view the slides from the presentation here.

Written by

Alex is a Software Engineer with a great passion for music and web technologies. He's experienced in web development and software design, with a particular focus on frontend and UX.

Related Posts

— November 28, 2018

Two New EC2 Instance Types Announced at AWS re:Invent 2018 – Monday Night Live

Let’s look at what benefits these two new EC2 instance types offer and how these two new instances could be of benefit to you. Both of the new instance types are built on the AWS Nitro System. The AWS Nitro System improves the performance of processing in virtualized environments by...

Read more
  • AWS
  • EC2
  • re:Invent 2018
— November 21, 2018

Google Cloud Certification: Preparation and Prerequisites

Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2018, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the first time. In t...

Read more
  • AWS
  • Azure
  • Google Cloud
Khash Nakhostin
— November 13, 2018

Understanding AWS VPC Egress Filtering Methods

Security in AWS is governed by a shared responsibility model where both vendor and subscriber have various operational responsibilities. AWS assumes responsibility for the underlying infrastructure, hardware, virtualization layer, facilities, and staff while the subscriber organization ...

Read more
  • Aviatrix
  • AWS
  • VPC
— November 10, 2018

S3 FTP: Build a Reliable and Inexpensive FTP Server Using Amazon’s S3

Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have conf...

Read more
  • Amazon S3
  • AWS
— October 18, 2018

Microservices Architecture: Advantages and Drawbacks

Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...

Read more
  • AWS
  • Microservices
— October 2, 2018

What Are Best Practices for Tagging AWS Resources?

There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...

Read more
  • AWS
  • cost optimization
— September 26, 2018

How to Optimize Amazon S3 Performance

Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...

Read more
  • Amazon S3
  • AWS
— September 18, 2018

How to Optimize Cloud Costs with Spot Instances: New on Cloud Academy

One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...

Read more
  • AWS
  • Azure
  • Google Cloud
— August 23, 2018

What are the Benefits of Machine Learning in the Cloud?

A Comparison of Machine Learning Services on AWS, Azure, and Google CloudArtificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to beli...

Read more
  • AWS
  • Azure
  • Google Cloud
  • Machine Learning
— August 17, 2018

How to Use AWS CLI

The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services.So you’ve been using AWS for awhile and finally feel comfortable clicking your way through all the services....

Read more
  • AWS
Albert Qian
— August 9, 2018

AWS Summit Chicago: New AWS Features Announced

Thousands of cloud practitioners descended on Chicago’s McCormick Place West last week to hear the latest updates around Amazon Web Services (AWS). While a typical hot and humid summer made its presence known outside, attendees inside basked in the comfort of air conditioning to hone th...

Read more
  • AWS
  • AWS Summits
— August 8, 2018

From Monolith to Serverless – The Evolving Cloudscape of Compute

Containers can help fragment monoliths into logical, easier to use workloads. The AWS Summit New York was held on July 17 and Cloud Academy sponsored my trip to the event. As someone who covers enterprise cloud technologies and services, the recent Amazon Web Services event was an insig...

Read more
  • AWS
  • AWS Summits
  • Containers
  • DevOps
  • serverless