What's High Availability? | NEL4 A3.1 |


What's High Availability? | NEL4 A3.1 |
What's High Availability? | NEL4 A3.1 |

A core engineering concept, regardless of whether you’re looking at network engineering, software engineering, or even mechanical engineering, is that of High Availability. So, what exactly is High Availability? This video will walk you through the basic concept of High Availability. 


- A really cool engineering concept, regardless of whether you're looking at networking engineering, software engineering or even mechanical engineering, is that of high availability. So what exactly is high availability? Well, availability is a term used to describe essentially, if a system or service is to remain available or not. Whereas High Availability, or HA describes whether a system should be more available than a regular system or always on. If you were to think of this on a large scale, you could take a hospital and a high-street shop for instance. A shop can be available, so it can open at eight in the morning and close at nine in the evening, giving plenty of time for everyone who wants to access its services the time they need to, but it doesn't need to be open all the time as most shops like a clothes shop are uncritical. A hospital, on the other hand, needs high availability, accidents and healthcare emergencies can happen at any time, whether in the middle of the day or the middle of the night. So how do you define if something should have high availability? Well, that's really down to the organization to decide what the most critical parts of the organization are. This might have been decided by a system architect or by a group within the organization. But you should always know which systems are supposed to have high availability, as they're likely to be business or organization-critical. Finally, how is something made to have high availability? Well, there are three principles that can help make something have high availability. One, it should eliminate single points of failure. This means building things into a system so that if there was a failure in one part of the system, like a component, it won't mean that the rest of the system fails. Two, it should have reliable crossover. This means that if a point fails, then it can be switched to a second one without any interruption. Think of having two Network Interface Cards or NICs, this concept is called NIC teaming and if one fails, the other one is available to cover it. Load Balancing works in the same way. If something is overloaded, the load balancer helps spread the work. And three, finally, failures should be detected as they occur. If there are any issues that happen, then they must be detectable as they occur, whether through maintenance or another form of notification system. However, the user shouldn't notice them. This is one of the core purposes of raid drives. For example, if there's an issue with one of the drives in your raid setup, the information is spread across multiple drives instead of just one and can be detected by the setup. How many drives can fail is dependent on the rates for tolerance. This gives the array some room in case one of the drives inevitably are shut out fails. So all of this together should allow you to identify which systems in your organizations are designed with high availability in mind. If you were designing a highly available system, what would you have it do? And that's it for this video.

About the Author