Investigations into the contribution of AIOps to IT Operations Management - Part I

Most people, when they hear the term "artificial intelligence" or its abbreviation "AI", will probably think of their favourite science fiction story from a film or book. Often the story revolves around an artificial intelligence that rebels and becomes dangerous to humanity. Fortunately, AI has not yet reached that stage in reality. The use of AI continues to drive human progress in research and technology: Cars that drive themselves, language-processing devices that can hold conversations with humans, algorithms that can learn from experience and make important decisions faster than would be possible for a human. This list is just a small selection of examples of how AI is being used to advance diverse fields.

So it is no surprise that the business world is heavily involved in this topic in order to achieve advantages through the use of AI in companies. In today's digital world, it is impossible to imagine many companies without IT, and so many also employ their own IT departments to manage all IT processes and operations with the necessary attention and know-how. These operations cover various fields such as user management, the management of the entire infrastructure or the administration of the company's digital business processes so that they can be carried out without interruptions. The successful handling of these operations can be very extensive, depending on the size of the company, and is a prerequisite for avoiding damage and extra costs. The data that is handled in the process is also continuously growing enormously, which constantly increases the complexity of IT and its requirements: This is referred to as "Big Data". In this context, the field of AIOps has emerged. AIOps is a combination of the terms "Artificial Intelligence" (abbreviation: AI) and "IT Operations", i.e. artificial intelligence and IT operations. It is about applying AI to operations in IT in order to improve them and make them more efficient.

AIOps is a relatively young field, but its popularity and visibility is growing because the developers and distributors of AIOps solutions promise exactly what makes it so interesting for companies: using AIOps to cope with the rapidly increasing demands and complexity in IT.

AIOps is generally not just the idea or a solution approach in IT, but also stands for the AIOps tools and platforms that bring the features and solution approaches. The term itself has only existed for a short time. Artificial intelligence has been helping to satisfy business needs since the 1980s. However, a subfield of AI, Machine Learning, only received more attention in the 21st century as a support for business processes. This saw the development of the field of "IT Operations Analytics", or ITOA for short. The problem with ITOA was that the analyses were based on past data, which made it very static and led to problems with dynamic infrastructures such as cloud or virtualisation technologies (containers, virtual machines and virtual networks). The need to also monitor these modern and agile infrastructures and to be able to better deal with real-time data led to the development of the AIOps field, where machine learning now also came more into focus.

One of the most relevant research and consulting firms in the IT sector, Gartner, first used the term "AIOps" in 2017 to describe how related tools can automate and improve IT operations by applying analytics and machine learning to so-called "big data". Gartner's exact definition was: "AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies, and presentation technologies"..

In the Market Guide for AIOps, Gartner also mentions that the goal of AIOps is to improve the quality of ingested data so that infrastructure and operations leaders can run diverse use cases with appropriate measures. Gartner also provides a graphic to support this definition and illustrate the use of AIOps platforms in IT operations management (see Figure 1).

 

Market Guide for AIOps Platforms

 

At the technical heart of the AIOps platforms are their three so-called pillars:

  • KI
  • Machine Learning
  • Big Data

 

In turn, the three main functions of AIOps are built on these three pillars, which are run through continuously and simultaneously:

The aim is to correctly analyse all data flowing into the AIOps tool and to derive from it the status of monitored systems as well as to determine correlations and possible dangers. Since this happens in real time, machine learning is used. It is impossible to manually detect anomalies, correlations between data or context in such huge amounts of data. An important goal of this function is to detect them quickly. This helps enormously in determining the root cause of a problem and enables it to be dealt with efficiently and avoided in the future.

  • Observe)

This function is about embedding AIOps into the company's IT service management (ITSM). Such a platform needs to be integrated and synchronised with the company's existing ITSM so that changes, events and dependencies in the company's system can be responded to with appropriate actions.

  • Engage

Operations are also automated to provide remedies for problems without requiring human intervention for every concern. For example, a specific script can be automatically deployed by the tool in response to a known problem to quickly and effectively eliminate it.

Operations are also automated to provide remedies for problems without requiring human intervention for every concern. For example, a specific script can be automatically deployed by the tool in response to a known problem to quickly and effectively eliminate it.

  • Act

The correct use of AIOps in IT should above all make it possible to cope with the high complexity and increasing demands in IT through flexibility. This improved efficiency in handling data will also result in cost savings (Gartner, 2022). These advantages make AIOps an interesting topic for every company.

 

ITOM - IT Operations Management

AIOps is primarily intended to assist IT Operations Management (ITOM). But what exactly is meant by this term? Companies and their customers are nowadays so dependent on immediate access to IT components that even the shortest failures can have enormous consequences and costs. The goal of ITOM is to repair and prevent these failures through proper management of IT components.

According to Gartner, ITOM software is defined as follows: "IT-Operations-Management-Software (ITOM) soll alle Werkzeuge bereitstellen, die für die Verwaltung der Bereitstellung, Kapazität, Leistung und Verfügbarkeit von Computer-, Netzwerk- und Anwendungsressourcen sowie für die allgemeine Qualität, Effizienz und Erfahrung bei deren Bereitstellung erforderlich sind.".

There is no precise definition of the scope of ITOM functions and responsibilities. Many ITOM software vendors define the scope differently depending on their offerings, but many agree on the three general areas. Here, BMC Software has published a white paper by Sudip Sengupta on ITOM and ITSM that goes into more detail about ITOM and its functions. BMC is one of the largest companies in the field of IT service tools. These three functional areas and various examples are according to BMC:

  • Management of the company's network infrastructure: efficient port/protocol management, management of all internal and external network communication, access management for authorised users
  • Management of the company's general IT infrastructure, i.e. hardware, servers, applications and workstation devices: Provisioning, configuring, managing and maintaining servers, virtual machines, laptops and similar relevant devices of the corporate environment, ensuring uptime of servers, applications and devices, managing data storage / email and file servers, patching and upgrading servers.
  • Help desk operations: First level support, provisioning user profiles, requesting backups, disaster recovery management.

 

According to this white paper, ITOM promises the following advantages through these functions, among others:

  • Improved availability of services
  • Better customer and user experience
  • Minimale Downtimes
  • Reduced cost of operations through regular monitoring

 

Gartner describes IT operations in more detail in their Market Guide. Thus, the inclusion of metrics and logs as well as analytics are the primary requirements for the associated teams. At the beginning, there is event correlation, extended to the analysis of metrics and logs, followed by analyses of the behaviour of systems and users. The goals here are:

  • The discovery of anomalies
  • The analysis of the causes of problems
  • Diagnostic information

 

In addition, there may be use cases that involve automation, such as the execution of scripts or automatic workflows.

After the theoretical background to the topic of AIOps has been explained in this article, the following will go into more detail about AIOps platforms. Not only how they are structured and what goals are to be achieved with them, but also what requirements they have to fulfil and what prerequisites companies should consider for using them - but also what risks arise as a result.