1. Home
  2. Training Library
  3. Microsoft Azure
  4. Courses
  5. Developing for Autoscaling on Azure

Transient Faults: Logging and tracking transient and non-transient faults

Start course

Develop your skills for autoscaling on Azure with this course from Cloud Academy. Learn how to improve your teams and development skills and understand how they relate to scalable solutions. What's more, in this course you can analyze and execute how to deal with transient faults.

This Course is made up of 19 lectures that will guide you through the process from beginning to end. 

To discover more Azure Courses visit our content training library.

Learning Objectives

  • Learn how to develop applications for autoscale
  • Prepare for the Azure AZ-303 certification
  • Design and Implement code that addresses singleton application instances


Intended Audience

This course is recommended for:

  • IT Professionals preparing for Azure certification
  • IT Professionals that need to develop applications that can autoscale


There are no prior requirements necessary in order to do this training course, although an understanding of MS Azure will prove helpful



Part of any retry strategy should include exception handling and other processes that log retry attempts. Although occasional transient failures and the associated retries should be expected from time to time a regular or increasing number of retries is often an indication of an issue that may be impacting application performance or availability. When logging transient faults you should log them as Warning entries instead of Error entries. By logging transient faults as Warnings you can ensure that monitoring systems do not detect them as application errors that might trigger false error notifications or alerts. It's also best to store a value in the log entries that indicates if a retry was caused by throttling in the service or by some other type of fault such as a connection failure. Doing so allows you to differentiate retry causes during analysis of the data. If you notice an increase in the number of throttling errors it could be, and often is, an indication of a design flaw in the application itself. 

It could also indicate a need to switch to a premium service that offers dedicated hardware. When developing, measuring and logging the overall time that an operation takes, including any retry mechanisms, is a good way to monitor the overall effect of transient faults on user response times and on process latencies. Logging the number of retries that occurs will allow you to better understand any factors that contribute to the response time of an application. You should also consider deploying or implementing a monitoring system of some sort. This can generate alerts any time the number or rate of failures increases abnormally. You should also monitor the average number of retries as well as the overall time required for operations to succeed. If these values are shown to be increasing an alert can be raised.

About the Author
Learning Paths

Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.

In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.

In his spare time, Tom enjoys camping, fishing, and playing poker.