Logging, Monitoring and Tracing
Do you have a requirement to identify the right frameworks and tools to build your own Microservices Architecture? If so this course is for you!
In this course we'll teach you how to combine different frameworks and tools into a microservices architecture that fits your organizational needs.
You’ve no doubt heard about the microservices architecture, but understanding and executing it can be a bit of a challenge. Through a series of videos, this course will introduce microservices, review multiple microservices frameworks and runtimes, and show you techniques to deploy them through a hassle-free DevOps pipeline. We’ll discuss containers, Docker, Spring Boot, NodeJS, .NET, OpenShift, Jenkins, Vert.x, Kubernetes, and much more.
Okay. We are going to put a bunch of things together right now. We are going to quickly walk you through logging, monitoring, and tracing to kind of give you an aspect of what it means to understand your working environment, your production environment, and get a little feel for how do you make things production ready. Not going to spend a lot of time on this but it should be noted that you have a little bit of opportunity to dive in on your own.
Again, our whole working environment that we have for you, will kind of walk you through all these things. So, the concept of having all those different distributed microservices, let’s say 10, 20, 50 of them actually running against a clustered environment, so they are running against all these nodes and you have no idea whether they are even running.
Having a form of aggregated logging is super important. One nice thing about Kubernetes in general, and OpenShift takes advantage of that certainly, is that it actually has the ability to grab anything that standard out. So, if you just logged a standard out, system.out.println for you Java guys out there, it will grab that content and aggregate it for you, using something called EFK. So, I know it sounds a bit odd, people like what is an EFK, as opposed to ELK, so instead of using Logstash, the L in the middle, ELK, we are using F Fluentd. So, EFK, and that is how we actually gather the different logs, showing up from whatever process that you want to actually log out from in this case a Java application, and it will pull it in.
And then Kibana is a way to actually interact with that data, as a good example. So, the logging problem, I don’t have it installed on the second environment but you can certainly explore that on your own. Monitoring, so by default monitoring is based on Heapster, Hawkular, and Cassandra. Casandra is the datastore for that, I could provide you the URL for you to get that, try that yourself. You can also run it locally with minishift when you do your minishift-start, if you go back to my Getting Started material, you will see where we did minishift-start.
In this case you can do minishift --metrics start that kicks on the Heapster and Hawkular capabilities, so you can actually see the monitoring being gathered. And also, some of nice graphs right there on your main overview page we can actually see how much CPU, Memory, and things like that you are using. Again, relatively simple, not going to focus on that too much.
Now, Tracing is where this gets even more interesting, and here is why tracing is super important. If you have a specific user transaction flowing throughout 10, 20, 50 different microservices. How would you know which microservice is the bad guy, when things fail. You have the circuit breaker to protect you from cascading failure, but how you known which was the one that did fail, and actually figure out what to do about that. And so, tracing is super important.
So, this is actually a great quote, specifically from the proposal for OpenTracing at the CNCF. Developers with experience building microservices at scale understand the role and importance of distributed tracing: per process logging and metric monitoring, what we talked about earlier, have the place, they are super very important, right? But they cannot understand the elaborate journey, a specific transaction may take throughout series of microservices, throughout distributed system. So, the Distributed trace is that journey. So, I love this particular quote.
It was part of the proposal that is bringing OpenTracing to the CNCF, specifically. So, the one that is very notable out there is called Zipkin. And Zipkin was the one that was the first one, if you will, into the space that gave us the ability to see spans across the different microservice applications, and then gave us an aspect of timing across all of those. So, Zipkin was the early mover, if you will, of the space. There is also another one that is happening right now.
Coming from the Uber team. This is called Jaeger, provides a same kind of capabilities, it is more OpenTracing-native, if you will, and it is going to be what we show you in a second, it is super simple. So, just keep that in mind. You are going to see, you are going to hear about Zipkin a lot in this space, you are going hear a lot about Opentracing.io, and Jaeger. All of them. If you look at Opentracing.io actually has a lot of different supported tracers that are available to you, just keep that in mind. And monitor this space. Opentracing.io is kind of where things are coming together, you can have more and more as it comes forward, so just keep that in mind. And, this specific case, though, I just want to show you a demo of Jaeger because I think it is pretty cool, we have it all integrated.
Again, if you go to our main demo, we talk about how to deploy Jaeger and how they get that setup, and if I go into my web console here, and let’s look at our helloworld-msa. We have applied tracing to these different microservices and you will see that, and Jaeger is also running here. So, we have the Jaeger component running, and then we have our front-end. Let’s go look at this real quick. So, here is a good example, here is the service chaining, or maybe let's just look at the API Gateway, all right.
So, the API Gateway is responsible for invoking all these other components, as an example, and you want to know which of them is maybe failing, or which of them might be poorly performing. And so, I have the Jaeger dashboard set up in a separate tab, here, okay. And I can basically look for a specific one. Over here we are looking at ola, and maybe I can basically, you know, it's grabbing these transactions as they go through, sampling them, collecting them, and then I can kind of drill down on them. And I kind of see here that ola, -- actually here is a good example, all right.
Ola calls Hola calls Aloha calls Bonjour, and you kind of see that chain of events there, you kind of see what, who is taking the longest, right. Bonjour took one millisecond, and aloha next took two-and-a-half milliseconds but then we can see it is actually getting longer here, right. Now it is 14 seconds, or 21 seconds, so you can kind of see how long everything takes and then drill down on that a little bit, you know.
Say go in there and figure out and work with that piece of code to understand why it might be taking so long, or might be that is the right performance you need. But this goes beyond monitoring, beyond logging, which are fairly basic, and they are very straightforward when it comes to an OpenShift backbone. It gives you the actual understanding of how different transactions flow throughout your environment.
So that is all I have to say about this particular topic. But catch us in the next video, we have got more things to show you.
Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, GCP, Azure), Security, Kubernetes, and Machine Learning.
Jeremy holds professional certifications for AWS, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).