How ATOM ONE makes the job easier for IT Operations
Operation restrictions always come at the worst time. Murphy's Law fully prevails in IT. We can expect an onslaught of user requirements whenever it is before vacations, or prior to the completion of an important - and often strategic – task. These are all, in my opinion, things that do not only happen in small and medium-sized enterprises. Therefore, I would like to share with you my own experience of how to prevent such things from the perspective of IT Operations, where I worked for several long years.
During the years of my involvement, I have encountered many problems that the IT world can pose, especially in times when, in the Czech Republic, cloud systems were used to operate e-mails or companies' websites only. IT has always faced a lack of trained staff who would be able to oversee often large infrastructures. This has advanced much further with the arrival of clouds, and hybrid worlds, hyper-converged infrastructures are no longer a thing of the past. IT personnel generally lose track of what is used in the corporate environment nowadays. That is why user responses, when something does not work for them, IT personnel often regard to be the fastest way of monitoring. However, some systems need attention as they have reached their critical level. Such care is not just a one-time activity, when someone decides to update the server. The system keeps evolving and unknown errors that need to be addressed are common. So, let's ask ourselves a question. Could some IT equipment or cloud help us with this? Could we make our life easier and look at our environment from above? Could we somehow solve the problems before they arise, meaning right at the beginning? These are often repeated questions that many IT managers ask themselves, having to be responsible for unavailability or a lower level of services (reduction of SLA or OLA). The IT Operations field is also important even if the capacity of data centres needs to be planned, and budgets and new equipment purchases need to be assessed. I am going to try to answer these questions I posed in the following lines.
The main area we should focus on is problem instead of incident solving, ideally before their occurrence. Proactive reports and prophylactic or health checks are used for this purpose. The prophylactic activities should focus on log checks, adequacy of computing resources, system settings, set and assigned users, set permissions, certificates close to expiration, system configuration, vulnerability identification, etc.… Yes, indeed, there is a lot of it. Having checked the data, we should create a record and indicate the time of the prophylactic check in the service log. Based on the findings, we should evaluate what the system requires, whether it runs correctly and if there is a risk of failure or unavailability. Enterprises that have gone through performing prophylactic inspections have moved on to the change monitoring stage. They aimed to make sure that the system was not manipulated by unauthorized persons, if there were any weak points in the system, such as forgotten open ports on the firewall, open sharing for everyone, etc. And only then can we continue to check the next system. This takes a lot of hours of time, which, as already mentioned, is very precious and limited.
Because we are aware of all these needs, we have been able to deal with them in ATOM ONE. That's why we have constructed part of the system to help IT Operations. ATOM ONE offers a ground plan view from the so-called cockpit, where we solve all parts of the environment as a whole and, basically, as separate disciplines in the world of IT Operations. It is not necessary to solve one problem 10 times, because it occurs on 10 devices, but we can solve it globally. Thanks to ATOM ONE, we identify the problem and, for example, add information about the problematic areas. The great advantage of the ATOM ONE service is its ease of use. Basically, all you have to do is to install an agent and the system will take care of everything and you will have peace of mind while focusing on important tasks.
The main thing that the ATOM ONE service offers is a central view of the state of the environment, where we can see from above which areas contain problems. Thus, we can maintain supervision over certificates, the status of operating systems, updates, changes in systems and the status of services, or we can monitor the status of accessibility of websites from the public Internet. There is much more to what ATOM ONE offers us in regard to monitoring from the perspective of IT Operations.
Each displayed area allows going through information from the so-called overview dashboard, where we really see only the basic things, via the displayed parts related to specific areas, to detailed logs and information transferred from the end device. Data can be obtained from 31 to 730 days according to the retention period setting (so-called recycling) and depends on the requirements that vary in each enterprise.
ATOM ONE also monitors system performance in relation to time and offers the possibility to check the disk capacity, CPU load, RAM, network load, or the number of events that occur over a time period and put this information into a broader context. This context can be a view of the number of running applications preventing overloading of the infrastructure or blending such information with information about the transmitted data at the network level. This gives the managers the opportunity to make better decisions when choosing the right dimension of the infrastructure and helps identify the so-called Bottleneck.
The system load view can be permanently available to the administrator and the graphs can then be placed on the operation dashboards, which IT Operations will monitor in the offices.
Managers can take advantage of overview workbooks that can be helpful at budget meetings, or in order to extract data when it is needed. The advantage is the fact that the reports in ATOM ONE never become obsolete and always show the current status, which is a great advantage in comparison to regular, even if performed (max. once a month) prophylactic checks.
Another thing that is definitely worth mentioning is gaining visibility into the environment, especially if there are, for example, several or more branches.
Thanks to ATOM ONE and the Service Map component, it is possible to view the infrastructure as a whole and monitor the interactions between the services that run on the devices, regardless of whether the underlying operating system is from Microsoft or some Linux based distribution. The location of the system is not a limit either, so we can have more data centres, a cloud and another housing or hosting in the data centre. ATOM ONE incorporates all this into one map and subsequently indicates if the devices can interact with each other. And not only if they can, but mainly how they interact with each other and whether there are any problems, delays or a reduced performance between individual devices.