1. Home
  2. Training Library
  3. Microsoft Azure
  4. Courses
  5. Developing for Autoscaling on Azure

Transient Faults: Manage operations that continually fail

Start course

Develop your skills for autoscaling on Azure with this course from Cloud Academy. Learn how to improve your teams and development skills and understand how they relate to scalable solutions. What's more, in this course you can analyze and execute how to deal with transient faults.

This Course is made up of 19 lectures that will guide you through the process from beginning to end. 

To discover more Azure Courses visit our content training library.

Learning Objectives

  • Learn how to develop applications for autoscale
  • Prepare for the Azure AZ-303 certification
  • Design and Implement code that addresses singleton application instances


Intended Audience

This course is recommended for:

  • IT Professionals preparing for Azure certification
  • IT Professionals that need to develop applications that can autoscale


There are no prior requirements necessary in order to do this training course, although an understanding of MS Azure will prove helpful



There will invariably be times when an operation continues to fail at every attempt. As such, it's important to consider how to handle these types of situations. While a retry strategy defines the maximum number of times that an operation should be retried, such a strategy doesn't prevent an application from repeating the same failing operation over and over again. To prevent constant retries for operations that just continually fail, you should consider implementing the circuit breaker pattern. When using this circuit breaker pattern, if the number of failures within a specified window, exceeds defined threshold, further request are immediately returned to the caller as errors. The application will stop attempting to access the failed resource or service altogether. The application can then periodically test the service to detect when it is available again. 

The interval at which it tests for availability would depend on the criticality of the operation or the nature of the service itself. Such an interval could be anything between a few minutes and several hours. When the test succeeds, the application can then resume normal operations and again, begin passing requests to the recovered service. While waiting on a failed service to recover, it might be possible for the app to fail back to another instance of the service or for the app to use a similar service that offers compatible functionality. Alternatively, you may be able to just redirect users to an alternate instance of the application if necessary. You might also be able to degrade the performance of the application but still offer minimal functionality. At worst, you could just return the message to the user that indicates the application is currently unavailable.

About the Author
Learning Paths

Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.

In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.

In his spare time, Tom enjoys camping, fishing, and playing poker.