Circuit Breakers
Start course
2h 24m

Do you have a requirement to identify the right frameworks and tools to build your own Microservices Architecture? If so this course is for you!

In this course we'll teach you how to combine different frameworks and tools into a microservices architecture that fits your organizational needs.

You’ve no doubt heard about the microservices architecture, but understanding and executing it can be a bit of a challenge. Through a series of videos, this course will introduce microservices, review multiple microservices frameworks and runtimes, and show you techniques to deploy them through a hassle-free DevOps pipeline. We’ll discuss containers, Docker, Spring Boot, NodeJS, .NET, OpenShift, Jenkins, Vert.x, Kubernetes, and much more.


Right, up next are circuit breakers where we’re going to talk about resilience in your microservices architecture. And circuit breakers are super-critical when it comes to actually ensuring that cascading failure does not ripple through the entire architecture. Let’s get into this. So, there’s a book called Release It! and it’s written by Michael Nygard. It’s a brilliant book because, I recommend you go and read it. 

At least read the parts of it where he talks about the stories of how he brings down an entire airline. So, in this case I have misconfigured JDBC driver without any kind of resiliency built into the system, actually had stranded babies and moms on ABC and NBC news. So, it’s a really great story. You ought to read it to show how badly programming can go, but in this book, he documents the concept of a circuit breaker. How do you deal with a cascading failure and do you intercept it so that it does not cascade all the way to the point where it actually disturbs end users, or strand babies and their moms in airports, is a good example. 

Now, Netflix OSS actually gave us a specific circuit breaker called Hystrix. Hystrix the little porcupine right there, is a very cool circuit breaker and bulkhead implementation. We’re going to show you that in a specific demonstration of what that means. But if you think of it from house standpoint is, it’s very straight forward. If you look at the house your circuit breaker board over in your garage or attic or whatever it might be, some of you might still on fuses, but many people are now on circuits. 

The whole idea is that when you overdraw the circuit, circuit heats up and the switch flips off, all right? It basically goes open. It opens the circuit so electricity doesn’t flow anymore and that’s essentially what you see here. Open the circuit. You don’t have to worry about any form of overheating the architecture. So, this happens in our daily life where let’s say someone plugs in a portable heater into the bathroom, and then you kick on the hairdryer, and by the way they’re portable radio is also plugged in, and next thing you know everything goes dark in the house. 

That’s an example of a circuit breaker, ensuring that your house does not burn down. Same principle applies here, we’re ensuring that your microservices architecture does not burn down. Now, this is the Hystrix dashboard. We’ll see it in action when we get into, it but it’s super comprehensive and exactly what neat, what you get out of the Hystrix dashboard. We can actually see how your different endpoints are performing and if they’re failing or not failing. 

Here is a good example where you kind of see the detail, but we’ll drill down this into demonstration. Okay, so let’s show you this. A couple of things we’ll show you real quick. As I mentioned earlier, this is the Don’t forget about it. But another demo I’m about to show you now is called the goodbye demo. It’s in the same major organization that we saw the other presentations, the other demos. So, I’m showing you goodbye specifically here. And what the goodbye demo does is different than hello, is that it actually shows you how to kind of break things. 

That’s really our point of it. And it specifically allows you to break, how to set up the circuit breaker. You can kind of see here is my Hystrix set up specifically for the aloha component but it shows you how to set up a circuit breaker and it kind of have a circuit breaker actually works. So, let’s kind of go into our terminal windows here, and we’re going to look at these two, primarily. Okay? So, what we have, and let me bring up my editor. Yeah, here we go. 

What we have is a client. And the client in this case is going to make an invocation of a remote endpoint and we’re going to run this outside of Docker, outside of Kubernetes, outside of OpenShift, we’re going to just run it on bare metal just say, you get a perspective of how it works. And we’re going to basically run a Spring Boot application and we’re going to bring that up. And what’s going to happen is this client is going to connect that Spring Boot application. 

We’re going to use the ApacheClient here. Let’s look at the ApacheClient. You can see it’s going to connect to localhost:80/api/nap, all right, right there. That’s the one it’s going to connect to. And actually, if I bring up my, if I bring up another window here, let’s just go to locahost, and API, and goodbye. So, there’s two endpoints on that service, okay? There’s a goodbye, and it says goodbye. And you see it actually does the System.out.println there. It shows that my browser here. And there’s one called nap. And what nap is, actually let’s see this up, get it right. Spell that correctly. 

And let’s actually bring up nap over here. What nap does it calculates pi. All right? Pi as in, you know, mathematical formula. Let’s actually look at it. So, ServerController. It calculates pi, in this case to the 20,000th decimal point. So, we calculate a lot of pi. And therefore it takes a little time. And, so, if you notice if I run the endpoint one more time, come back over here and watch what happens over here, okay? If I hit that, it just takes just a moment, right, just about a second or two. And that’s fine when it’s just a single user operating a single endpoint. 

But what happens if you have a bunch of users hit that endpoint. So, if we come over here now, we are going to run my client. Yeah, let’s do this. Let’s see exec:java. I’m going to run 200 requests. Well, let’s get the client correct. Class not found. mvn clean compile, there we go. There, yes, we got to compile the code in order to run the code, so there we go. Now, we’re sending in 200 requests against, in this case, a Spring Boot application. And notice what happens here. It even stops responding. So, literally what’s happening is you have 200 requests going into a 100 threads in a thread pool. Those 100 threads are all taking a few seconds each, and of course because it’s calculating pi and doing a lot of aggressive CPU calculations, it takes a little time, and it also kind of locks up my browser, as you can see here. 

My browser is not even getting its request service. So, in this case it’s just 200 requests simultaneously attacking a hundred threads on the server side thread pool. So, if you can kind of look at my Spring Boot application here, you can see where we say 100 threads in a thread pool. So, we basically have kind of fixed it so we ensure that it doesn’t dynamically allocate threads in some sort of clever way. This is actually what you want to do in a Docker, Kubernetes world. You want to fix the thread pool count, that way it doesn’t try to dynamically allocate resources it doesn’t actually have access to. So, in this case we have the goodbye method, and we have the nap method, okay? 

Pretty straightforward but one just takes a little bit of time and actually does a lot of processing. That’s all it does. And, so, you can see what happened there. My client hit the server and it waited for the server to respond. It all worked correctly but it took a little time. And, so, if you have -- and just so you know don’t think that I’m picking on poor Spring Boot. I’ll bring up my application server here. This is WildFly Swarm based on the WildFly application server. And I can run it also, and it’s going to come up. Let me bring up my browser, while that’s coming up, over here. Same idea applies except now it’s WildFly Swarm instead of Spring Boot. And again, same concept. 

If I hit it with 200 requests and it only has 100 threads in its thread pool it will run out of resources and you will see it, not be responsive to our end users. So, that concept of the client being able to be essentially, a denial of service attack. The server is problematic. And you don’t want that many cases. And it’s very problematic in a microservice architecture. So, you can see there, it’s still trying to finish up and respond. But I want to show you how we mitigate that and deal with that. So, looks like it’s okay now. Now all finishing. And just to kind of really make the point let’s actually bring up the Vert.x based system. All right, Vert.x of course is very super lightweight, it starts really fast. 

Same idea. The only difference between what we see in Vert.x versus the others is Vert.x is a non-blocking async architecture so it’s always responsive. So, the end user at least is getting answers back even though it’s still taking the same amount of time do the actual overall job, running those 200 requests. But in either case the first 100 requests are queued. We’re waiting for the next batch to go through and eventually the client is done. So, one strategy people employ, is they actually think what they’ll do is on the client side is, get back over here to my client. Okay? 

Their strategy is to say look, we’re going to do a timeout instead. Okay? Instead of the client kind of blocking and waiting let’s actually time it out and see if we can actually get a response back. Let me hop back over to Spring Boot also. So, let me see, we get Spring Boot up. So, in this case, we’re going to actually use the TimeoutApacheClient. So, this is standard stuff, right? This is a standard Apache client, what people will use and here is a timeout version of that. 

It’s going to wait only a little bit of time. You can see 1000 milliseconds to actually see if it can make the connection then it’ll all fall back, right, it’ll fail. And you actually see, fallback right there, but let’s go ahead and give it a run. And let it go there. And watch what happens. So, there it’s making the requests but we’re also getting some fall backs. So, for the request that could not be queued, right, it’s just falling back. So, the good news is the client is not blocked. The bad news is the server is still doing all that work that nobody is waiting for any longer. 

So, the server actually had the request. It’s trying to process the request but there’s no one even listening on the other side. So, this is where it’s problematic to think that timeouts are good enough for you. So, what you really want is the circuit breaker concept or, in this case with the bulkhead, the bulkhead being very critical. What’s going to happen here is just like bulkheads in a ship. So, if you think of a big old ocean going vessel, the whole concept that you have compartmentalized parts of this ship, so that if one compartment floods you can shut that door, and you’ve seen this is old movies, right, where the guys runs in there and turns the big wheels and shuts a big iron door so that the rest of the ship doesn’t flood. 

The idea of a bulkhead ensures that you only flood from one compartment to the next to the next, not the whole entire ship going down. This didn’t work in the case of Titanic, but that’s a different conversation. So, here let’s actually show you what Hystrix does. Okay? Let’s run this now with the HystrixClient. I have just restarted my Spring Boot application to make sure we’re clean and let’s run it now. Watch what happens here. It’s going to immediately throw in 10 requests, get back 10 responses, and all the others will fail. So, in this case what Hystrix has done is says look we’ve already have sent in the original 10, 10 is the default, we’re not getting answers back just yet. 

We’re going to wait and before we flood the server with even more requests, we’re just going to fail those other requests that we couldn’t even respond to. So, Hystrix is monitoring the transaction between the client and the server, and the ensuring that you don’t overwhelm the server. And, so, the bulkhead in this case is the 10 threads and it’s also doing the circuit breaker concept. Now, this is also incredibly powerful if the server runs slow. So, Hystrix deals with the fact that a server can’t respond and it also deals with the fact that a server may respond slowly which is the even more insidious problem that you might have in your microservices architecture. So, if you can imagine applying this to several chained services throughout the architecture. 

To kind of make that point let’s actually bring up our microservices front-end we had earlier. Here we go. And I’m going to bring up this guy over here. This is the Hystrix monitor. We can kind of see it running here. All right, I’m refreshing to make sure it’s all connected nicely, let's sort it alphabetically. And what we want you to do is let’s look at this service chain. So, here we have Bonjour, right? And, so, Hola calls Ola calls Aloha calls Bonjour. If Bonjour blows up, if Bonjour is dead, if Bonjour is too slow, I don’t want it to cascading fail up the chain. 

I want to basically stop the communication, stop the problem right here. So, just like you saw earlier in that more simple example. If I come over here and actually hit refresh, you’re going to see we get some good request. See, that 2 green right there? I’ll zoom in on it. I just hit it twice so we got two good requests. That’s what it’s saying. Let me hit it one, two, three, we should get a three there. Three good requests is a good example. So, Hystrix is monitoring the transactions through that connection, through that channel, ensuring that it’s all good. 

In this case, three good requests, we’re good. If in fact it sees going badly, we can simulate that by just simply going back to our OpenShift console here. You can see we had the three load balance earlier. I’m going to actually just drop this down to zero. We’re going to kill Bonjour. We’re wiping it out, taking it off the system. It’s just my Docker container running with my Bonjour. And I’m going to now come back over here to my front- end. And watch what happens when I click refresh. This is kind of solo. You got to watch it closely. 

Notice this little orange right there, and a 100% error rate. Right, there is only one transaction that went through the system, 100% error rate, and we had one timeout request. Notice also though, we did get the fall back mechanism. So, the good news is, we didn’t blow up, we didn’t have a cascading fail to blow all these guys up. We stopped the transaction at the point where the problem was. We gave a default answer to our user, kind of like we showed you earlier with the concept of you we can’t tell you how much inventory we have in the store, but we at least can tell you where your store is. Same concept applies here. In this case we programmatically said here is our fallback. 

It’s just a piece of text we’re responding to, when we don’t get the original response. Now watch what happens closely here. You notice the circuit is still closed, however. From Hystrix standpoint monitoring the environment, it thinks it’s okay and it’s hoping things recover. But what if things don’t recover? We have built into this, a default implementation that says if you get five bad transactions, assume things are really bad. So, watch what happens here. Watch this circuit closed. We hit refresh, refresh, four, five, right? And watch what happens. Okay? I think I hit it enough. 

The circuit is now open. What this means is, I’m going to bounce off quickly now because Hystrix is not even letting transactions go through any longer. And this is super important in a distributed architecture, in a microservices architecture, where if something is down, think about this from a user standpoint. Think about this when you’re actually interacting with someone else’s website, or web application. What do you do when the site is slow, or what do you do when you think the site is down? You hit refresh, refresh, refresh. And what are you doing when you do that? 

You think you’re helping the poor program on the other side, right? No. You’re kicking the dead guy while he’s down. If he’s already dead let him be dead and don’t kick him while he’s down. Let him get back up, and get him back where he’s healthy and strong, and then go kick him around again. So, in this case the circuit being open says stop hitting the dead guy and I’m bouncing off now. I’m not hitting the dead guy any longer. I’m bouncing off. That’s what circuit equals open. 

Now, Hystrix is sending one transaction through about every fifth times, say… you'll notice this is a little sticky. Hey, are you alive? No. He’s still dead. Bounce off, bounce of, bounce off. Hey, are you alive? Nope. Still dead. Bounce off. That’s the concept of what’s happening in the circuit breaker. And if I come over here now and bring my bonjour back to life on my web console here. 

All right, bring that guy back up and I’ll just run it up to two, just to kind of make a point. It doesn’t really matter as long as we have one of them alive and well. Okay, it’s not quite ready yet. You can see it in its light blue state. Okay, the first pod is ready, the first container is ready. Now let me come over here. Notice the circuit says open, it says fallback, now watch what happens. Closed. We’re healed, right? We now are getting valid responses from our endpoint and the circuit is closed. Hystrix said we’re all healed. We’re good again. 

So, the dead guy is no longer dead. Transactions are free to go through. So, that is the concept of the circuit breaker and why it’s so important in a distributed architecture. But stay tuned we’ve got more things to show you

About the Author
Learning Paths

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).