The Challenge with Logs
If you consider your time working with technology, reviewing log files must be one of the most daunting tasks. It certainly is for me, line after line of output messages, that may or may not be readable to a human being, searching for the few words, which hopefully bring clarity to a deployment failure, or a production outage.
For troubleshooting success, you must rely on a few characteristics of those logs to be present:
- Consistent timestamps
- Log Level identification
- Meaningful message body
There are other pieces to the puzzle as well. I skipped the most important one so far, the log must exist for the system or device you are interested in. The settings must be configured correctly, ensuring logging is enabled, the output persists reboots, and rotates.
Once you have all this in place. You are just waiting for the time when you need to review the log. Or rather you are hopeful you never will, a sign the systems are running smoothly.
When the inevitable happens, connect to your system, search for the log file output, and then you begin that long scrolling of the mouse wheel, looking for that key piece of information. If your outage is widespread, you might have to do this over and over across systems, piecing together snippets of errors and clues to find the root cause. This is a manual effort, the clock ticks as your time to resolution increases.
Quite simply, the scale and volume of machine-generated data is increasing exponentially and making sense of it is an overwhelming task.
But as you will have guessed from the title of this blog post, there are options, a smarter way to work. Deploy a log analysis tool and provide a highly scalable log management platform. Centralising all those individual device and system outputs into a tool, which will help to decode and analyse the logs. Choose the right tool, and this will offer intuitive features such as actionable insights, the ability to respond based on the logs receives; deep operational visibility and faster troubleshooting, maybe even anomaly identification for logs that are outside of the observed baseline.
Introducing vRealize Log Insight
The first part in this series introduces the concept of leveraging an AIOps model to improve monitoring, alerting and problem resolution across your platforms and clouds. We described the three pillars essential to every AIOps implementation (Observe, Engage and Act), which we will continue to use as we delve into the capabilities of vRealize Log Insight.