Welcome | Sign In
ECommerceTimes.com
Data Center

PODCAST
Fixing IT Problems Before They Occur

Print Version
E-Mail Article
Reprints
Fixing IT Problems Before They Occur

IT executives are seeking more automated approaches to not only remediate problems, but also to get earlier detection. These same operators don't want to replace their systems management investments, they want to better use them in a cohesive manner to learn more from them, and to better extract the information that these systems emit.


Complexity in today's IT systems makes previous error prevention approaches for operators inefficient and costly. IT staffs are expensive to retain, and are increasingly hard to find. There is also insufficient information about what's going on in the context of an entire systems setup.

Operators are using manual processes -- in reactive firefighting mode -- to maintain critical service levels. It simply takes too long to interpret and resolve IT failures and glitches. We now see 70-plus percent of the IT operations budget spent on labor costs.

IT executives are therefore seeking more automated approaches to not only remediate problems, but also to get earlier detection. These same operators don't want to replace their systems management investments, they want to better use them in a cohesive manner to learn more from them, and to better extract the information that these systems emit.

To help better understand the new solutions and approaches to detection and remediation of IT operations issues, I recently chatted with Steve Henning, the vice president of products for Integrien, in a sponsored BriefingsDirect podcast.


Listen to the podcast (29:53 minutes).

Responding Before the Problem

Here are some excerpts:

Steve Henning: IT operations is being told to either keep their budgets static or to reduce them. Traditionally, the way that the vice president of IT operations has been able to keep problems from occurring in these environments has been by throwing more people at it.

This is just not scalable. There is no way ... [to] possibly hire the people to support that. Even with the budget, he couldn't find the people today.

If you look at most IT environments today, the IT people will tell you that three or four minutes before a problem occurs, they will start to understand that little pattern of events that lead to the problem.

But most of the people that I speak to tell me that's too late. By the time they identify the pattern that repeats and leads to a particular problem -- for example, a slowdown of a particular critical transaction -- it's too late. Either the system goes down or the slowdown is such that they are losing business.

Complexity Equals Challenge

Service oriented architecture (SOA) and virtualization increase the management problem by at least a factor of three. So you can see that this is a more complex and challenging environment to manage.

So it's a very troubling environment these days. It's really what's pushing people toward looking at different approaches, of taking more of a probabilistic look, measuring variables, and looking at probable outcomes -- rather than trying to do things in a deterministic way, measuring every possible variable, looking at it as quickly as possible, and hoping that problems just don't slip by.

If you look at the applications that are being delivered today, monitoring everything from a silo standpoint and hoping to be able to solve problems in that environment is absolutely impossible. There has to be some way for all of the data to be analyzed in a holistic fashion, understanding the normal behaviors of each of the metrics that are being collected by these monitoring systems. Once you have that normal behavior, you're alerting only to abnormal behaviors that are the real precursors to problems.

One of the alternatives is separating the wheat from the chaff and learning the normal behavior of the system. If you look at Integrien Alive, we use sophisticated, dynamic thresholding algorithms. We have multiple algorithms looking at the data to determine that normal behavior and then alerting only to abnormal precursors of problems.

Nip It in the Bud

Once you've learned the normal behavior of the system, these abnormal behaviors far downstream of where the problem actually occurs are the earliest precursors to these problems. We can pick up that these problems are going to occur, sometimes an hour before the problem actually happens.

The ability to get predictive alerts ... that's kind of the nirvana of IT operations. Once you've captured models of the recurring problems in the IT environment, a product like Integrien Alive can see the incoming stream of real-time data and compare that against the models in the library.

If it sees a match with a high enough probability it can let you know ahead of time, up to an hour ahead of time, that you are going to have a particular problem that has previously occurred. You can also record exactly what you did to solve the problem, and how you have diagnosed it, so that you can solve it.

We're actually enhancing the expertise of these folks. You're always going to need experts in there. You're always going to need the folks who have the tribal knowledge of the application. What we are doing, though, is enabling them to do their job better with earlier understanding of where the problems are occurring by adding and solving this massive data correlation issue when a problem occurs.


Dana Gardner is president and principal analyst at Interarbor Solutions, which tracks trends, delivers forecasts and interprets the competitive landscape of enterprise applications and software infrastructure markets for clients. He also produces BriefingsDirect sponsored podcasts. Disclosure: Integrien sponsored this podcast.


Print Version E-Mail Article Reprints More by Dana Gardner


More by Dana Gardner

Nothing New Under the Business Commerce Cloud?
November 22, 2009
Business commerce clouds are all about leveraging cloud architecture to go to the next level: a dynamic business-services environment that wells up around the needs of a business group or niche, and then subsides when lack of demand dictates. Is this the wave of the future, or are we really just pouring old "business webs" wine into new bottles?
Text Analysis and the Next Generation of BI
November 15, 2009
External data has grown in both volume and importance across the Internet. Companies are figuring out ways to make the most of Web data services for business intelligence. Real-time text analytics fills out a framework of Web data services that can form a whole greater than the sum of the parts. However, any BI or any text analysis is no better than the data source behind it.
Pumping Up Performance in Densely Packed Data Centers
November 08, 2009
Thanks to architectural advancements and better efficiencies, densely stuffed data centers can carry ever-greater loads, and that can certainly work to consolidate and ultimately reduce costs. However, having fewer data centers means all the information they handle will likely have to travel longer distances between server and user. Network services and Internet performance management may be the solution.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network