Getting everyone on the same page when it comes to using event correlation and monitoring tools is no easy task, especially in large organizations. Different teams tend to purchase their own tools and use them in their own silos—and spend money needlessly, instead of centralizing IT Ops.
Jay Batson, engineering manager at systems integration and consultancy Greenlight Group, said that can wreak havoc on a budget.
“It’s unbelievable the amount of money that gets wasted.”
Batson encountered this when working with a client a few years back. His client was going to spend around $500 per network device, and the company needed about 20 devices just to monitor one port. The kicker: Another internal group already had a tool that could do the job for about $75 per device, he recalled.
Batson told the client it didn’t need to buy another tool, since the lower-priced software had the same features. Then they completely missed the point, he said: “They asked me to set up their own instance of that same tool so they could have the savings and not have to deal with anyone outside their department.”
Here are six IT Ops-related factors you need to know to ensure a smoother digital transformation.
1. Digital demands monitoring
Organizations can be overzealous in their rush to digitize, with different groups deploying their own monitoring tools to handle the multi-clouds they use. But industry observers say this siloed approach is creating operational complexity, and hindering digital transformation.
Mark Hughes, senior vice president of offerings and strategic partners at IT service firm DXC, said many of his customers are still struggling with the whole concept of digital transformation.
“Often, there’s a rush to public cloud without recognizing the fact that hybrid cloud is something that will be with us for many years to come.”
Getting to your target state of hybrid infrastructure can take some time, Hughes said. Plus, not everything needs to be in the cloud, which brings variability in cost and scalability, so it requires careful planning, he added.
Further, to yield cost benefits, operational changes need to be made, including different testing, development models, and security. “So it requires a fundamental re-engineering,” making it important to take the time to figure out your strategy is, he says.
Where digitalization goes, monitoring must follow, said Valerie O’Connell, a research director at Enterprise Management Associates (EMA).
“So the proliferation of monitoring tools is no surprise, and it’s no mistake.”
Use of monitoring tools is invaluable; they are designed to get the right people the right alerts in a timely fashion, said Greenlight Group’s Batson. That “does more than just keep our head above water. It keeps completely preventable outages from ever happening.”
Monitoring tools help infrastructure and operations leaders gain insight into the availability and performance of their systems, networks, and applications, while also helping to home in on the root cause of performance degradations, said a report by the analysis firm Gartner.
These tools frequently have predictive capabilities, such as the ability to forecast bandwidth requirements or when disk availability will reach capacity.
2. Beware the silos
Silos become problematic when you consider that “the use of multi-cloud is driving more heterogeneity, thus, more complexity,” making it more difficult to operate with tools that are siloed, said David Linthicum, chief cloud strategy officer for Deloitte Consulting.
Gartner noted in its report:
“Today’s monitoring systems are bought to solve the needs of individual silos of IT Ops focused around infrastructure, network, and applications. This silo-based procurement of monitoring solutions leads to long resolution times and tool sprawl.”
To help avoid the silo issue, organizations should leverage a minimal number of monitoring technologies that are not native to any given platform, he said. Linthicum added that the biggest mistake he sees is not promoting the standardization of core cross-application features, such as security and governance.
“If all of the dev teams are allowed to pick any technology they feel is right, than ops becomes so complex that an IT Ops tool won’t be able to help.”
That said, he acknowledged that it’s hard to get people to give up tools they are already using in silos and standardize on the same technologies. On top of that is the effort involved in training, deployment, and setting up playbooks, or ways to leverage the tools properly for CloudOps, Linthicum added.
3. Reducing the monitoring melee
EMA’s O’Connell believes most organizations are looking for ways to consolidate tools “and to provide a cohesive way of using the data they generate.” One approach they’re taking is to use AIOps platforms, which provide “an overlay function that makes sense of complexity as a basis for unified action,” she said.
Yet, for all the tools, most organizations still report gaps in their monitoring, O’Connell added.
Greenlight Group’s Batson said IT should be mindful that the target of what needs to be monitored is always changing. Automation, of course, can help, but “defining what is ‘actionable’ also seems to elude customers,” he said.
Monitoring tools lend themselves to automation and creating actionable items, he said, but that causes other issues. Clients want to automate monitoring is usually a sign of problems, Batson said.
“This tends to bring to the surface the lack of organization the IT infrastructure has.”
For example, Batson said, when the monitoring admins start asking the business people, “What is actionable for you and what else do you want to monitor?” the reply is always, “Well, what should I monitor?” “And the circle of ‘What can we?/What should we?’ begins,” Batson said.
And all the monitoring data in the world isn’t a great deal of help in the absence of context—such as what services are being affected and what business outcome is in play, he noted.
4. Don’t be afraid of starting out
The tools themselves are fine, and generally do what they are supposed to do, Batson explained. But the organization must take a step back and build a process and workflow around the tools. “It’s never perfect” when you start, he said, but that shouldn’t prevent you from taking the first step.
DXC’s approach to IT Ops is to move clients beyond alerting and reacting to observing and being proactive, Hughes said. However, like other experts, he said he often finds “a whole swag of heterogeneous tools in many different environments.”
Heterogeneity has always been the core issue in IT Ops, Hughes said. Once teams decide to automate, they need to figure out how to measure one set of tools against another by putting a process in place, he said.
This is especially important as companies move from infrastructure as a service to platform as a service, because there are “more complex tools in hyperscale environments and they’re constantly changing,” Hughes said.
An underlying issue is that complexity often comes from the fact that IT Ops staffers are bringing in more tools as the underlying technology changes, he said.
“That’s where we see trying to get to the single source of truth being tricky.”
5. Successful IT Ops automation
When figuring out how IT Ops should be automated, the key is “understanding how to go about transforming those environments” most effectively, Hughes said.
The most effective approach DXC has seen is when a customer starts by looking at their full tech stack and migrating a vertical slice of their business unit or process, Hughes said. When the full IT stack is top of mind, then automation and management processes can be built in from ground up.
As Greenlight Group’s Batson sees it, IT Ops teams need to understand there is no “easy button.” It’s important to know that, just because something has been monitored, it is not all set forever.
If teams take that approach, they tend to get caught by something random, because something changed and they didn’t change their monitoring strategy to accommodate.
It helps when subject-matter experts monitor the areas they are responsible for, Batson said. The organizations that structure observability operations this way are most successful because everything’s monitored the way it should be.
6. Alert fatigue can be managed
All organizations have alert fatigue to some extent, Batson said. AIOps is getting better, and that takes some of the load off admins, he added. But there’s a long way to go.
A majority of the organizations I work with have limited automated monitoring, Batson said. It is still a manual process to get servers and network devices monitored.
“Having AIOps auto-correlate events to quiet the noise is helpful to everyone. But we still need to make sure everything is monitored.”
If an alert is too noisy, get that information to the monitoring admins so they can tune the tool. “We still need to be involved to some extent to make monitoring better for everyone in the organization,” Batson added.
Leadership needs to take monitoring seriously and communicate that, he said.
“Everyone must understand, just like any other project, you have to put in some effort and feedback to make the system better.”
Start with your strategy and work out from there
Getting to a target state of hybrid infrastructure is often missed, and it can take some time to get there, Hughes says. His advice? Think through what you’re looking to achieve first. Not everything needs to be in the cloud, which brings variability in cost and scalability in compute, so it requires careful planning, he says. Further, to meet the cost benefits, you may need to make operational changes in areas such as testing, development models and security. “It requires a fundamental re-engineering,” he said, so take the time to figure out what your strategy is before moving forward.