When developing an application, it is easy to forget to include good logging. With good logging I mean useful logging for when the application does not behave correctly. As always looking into the future is very difficult to predict failures. Otherwise we would have written code to handle it. But applications also deteriorate when they are maintained routinely, and good logging becomes garbage when needed.
I was in a situation with a client where an old, but very critical, application became unstable. There was some logging, but not as useful as it good be. Adding and rewriting the logging was not feasible in the short term. So, we began to look at other methods. Finally, we settled on Application Insights on Azure.
The main reason to use Application Insights was the ease of implementing and the comprehensiveness of the logs. Application Insight is an Azure service and runs in the cloud. At first I had dismissed this possibility as the application was hosted on premise. Lift and shift of this application was not possible so how could we use an Azure cloud service to log?
Well we could, and it was so simple we added it to all other legacy applications they had. And within two days we had it implemented, tested and deployed for five different applications. It was just a matter of adding a Nuget package and configuring the required Application Insights endpoint. Despite the extra IO the logging created we could not notice any reduction in performance.
The only negative point I can think of, is the effort it takes to learn and understand the user interface of Application Insight in the Azure Portal. The overload on details in the logging and the not always very intuitive navigation is something to get used to. But within a few days we had idetified the biggest problems and fixed them. The applications became stable again.
It was all too easy I thought. And yes, we had good logging that was useful in solving the problems we had. But with it came a huge load of logged error’s we had never seen or anticipated. Small issues customers had never reported. So instead of a shorter backlog, due to a higher solving velocity, we got a long list of small issues on our backlog. Those issues we could have ignored, but due to the amount they hid the real problems.
The other lesson we learnt was that Application Insight uses a sampling algorithm. It will not log all events, but it samples (by default every fifth event). It will give a correct number of events, but it does not show a trace of every event. But the sampling is so that you will see enough variation in the data to get a good sense of the issue.