Cloud Academy Team

July 12, 2019

Microservices: Using Distributed Tracing for Monitoring & Troubleshooting

Modern applications can be found everywhere today. Distributed microservices, cloud-native, managed resources, and serverless are parts of this complex whole. But how can we keep track of so many elements in our production environments?

In these distributed environments, microservices communicate with each other in different ways: synchronous and asynchronous. Distributed tracing has become a crucial component of observability — both for performance monitoring and troubleshooting.

In this article, I’m going to discuss some key topics in instrumentation, distributed tracing, and modern distributed applications. To better understand these topics, watch our webinar on Distributed Tracing in Modern Applications.

What is distributed tracing?

Tracing is a way of profiling and monitoring events in applications. With the right information, a trace can reveal the performance of critical operations. How long does a customer wait for an order to be completed? It can also help to a breakdown of our operations to our database, APIs, or other microservices.

Distributed tracing is a new form of tracing that adapted better to microservice based applications. It allows engineers to see traces from end to end, locate failures, and improve overall performance. Instead of tracking the path within a single application domain, distributed tracing follows a request from start to end.

For example, a customer makes a request on our website and then we update the item suggestion list. As the request spans across multiple resources, distributed tracing takes into account the services, APIs, and resources it interacts with.

Applications become more and more distributed

Automated microservices instrumentation

Exploring distributed traces might sound simple, but collecting the right traces with the right context will require considerable time and efforts. Let’s follow an example where we got an e-commerce website that updates our database with purchases:

In this example, which is not distributed, to create an interesting trace, we will need to collect the following information:

HTTP request details:
1. URL
2. Headers
3. The ID of the user
4. Status code
Spring Web:
1. Matched route and function
2. Request params
3. Process duration
RDS database:
1. Table name
2. Operation (SELECT, INSERT, …)
3. Duration
4. Result

To capture this information we can either do it manually before and after every operation that we make in our code or automatically instrument it into common libraries.

By “automated instrumentation,” we mean “hooking” into a module. For example, every time we make a GET request with “Apache HttpClient,” there will be a listener. It will extract and store this information as part of the “trace.”

Collecting this information manually using logging is not recommended since they are not structured well. Using a more standard way, like OpenTracing, will allow us to filter out relevant traces. We will also have the option to present them nicely in many tools. For example, Python might look like this:

Capturing HTTP request in Python with OpenTracing

As you can see, this kind of instrumentation requires heavy lifting. It involves integrating to our libraries, as well as constant maintenance to support our dynamic environments.

Standards and tools

OpenTracing

Luckily for us, there are already microservices standards and tools that can help us to get started with our first distributed traces. The first pioneer was OpenTracing, which is a new, open distributed tracing standard for applications and OSS packages.

Using OpenTracing, developers can collect traces into spans, and store extra context (data) to each one of them. For example:

Spans can have a relation – `child of` or `follows from`. These relations can help us get a better understanding of performance implications.

To trace a request across distributed microservices spans, we must implement the inject/extract mechanism to inject a unique “transaction ID.” Then we would extract it on the receiving service. Note that a request can travel between microservices in HTTP requests, message queues, notifications, sockets, and more.

Another common standard is OpenCensus which collects application metrics and distributed traces. OpenCensus and OpenTracing recently merged into a unified standard called OpenTelemetry.

Jaeger

After the exhaustive task of collecting distributed tracing, comes the part of visualizing them. The most popular open source tool is Jaeger, which is also compatible with OpenTracing format. Jaeger will output our traces into a timeline view, which will help us understand the flow of the request. It can also assist in detecting performance bottlenecks:

Managed solution

Ultimately, you might want to consider an automated distributed tracing solution. Epsagon, for example, uses automated instrumentation to provide microservices performance monitoring and visualization of requests and errors in an easier way:

A managed solution for distributed tracing provides the following benefits:

Traces are being collected automatically without code changes.
Visualizing traces and service maps with metrics and data.
Query data and logs across all traces.

Summary

Distributed tracing is crucial for understanding complex, microservices applications. Without it, teams can be blind into their production environment when there is a performance issue or other errors.

Although there are standards for implementing, collecting, and presenting distributed traces, it is not that simple to do manually. It involves a lot of effort to get up and running. Leveraging automated tools or managed solutions can cut down the level of effort and maintenance, bringing much more value to your business.

To deep dive into microservices, check out Cloud Academy’s Microservices Applications Learning Paths, Courses, and Hands-on Labs.

Microservices: Using Distributed Tracing for Monitoring & Troubleshooting

What is distributed tracing?

Automated microservices instrumentation

Standards and tools

OpenTracing

Jaeger

Managed solution

Summary

Introducing Our Newest Lab Environments: Lab Playgrounds

How to Install and Use Python

Python: What Is It and Why Is It so Popular?

Boto: Using Python to Automate AWS Services

Understanding Python Datetime Handling

Mastering Python Programming

Introduction to Python Learning Path

How to Build an Intelligent Chatbot with Python and Dialogflow

Amazon Machine Learning: Use Cases and a Real Example in Python

Installing Python version of the AWS CLI on Windows