The Operational Advantages of Unifying ITSM and ITOM
By Jonah Kowall
When determining how to change your culture, one significant area of investigation should be the right tooling to match your organizational goals. There is always a discussion between the best-of-breed tools and broader platforms, which can serve many areas. While you can’t always select the best-of-breed tools due to complexity and cost, they should be used in areas where there are significant advantages, especially ones that can have a significant impact on the business.
Workflow tools, such as those used in the development, operations, and service management teams, are critical when managing work, but analytics tools are also critical to help scale people and deal with today’s complex applications and infrastructures. One common area of discussion regularly heard is how and when to bring together the Service Management and IT Operations teams within an organization.
From an operations perspective, the goal is to be immediately notified when there’s a detected or reported issue. This means that although we have many (if not too many) monitoring tools, they are often controlled by different teams. Many are now trying to consolidate these alerts into a single event correlation tool (now part of AIOps as per Gartner).
While there are many great tools that focus on this one specific problem (e.g. MoogSoft, BigPanda, OpsGenie, and VictorOps), there are other approaches which include leveraging existing toolsets. Many organizations have tooling from legacy vendors, such as BMC, Micro Focus (formerly HP), CA, and IBM, which can do event correlation, but these tools cannot keep pace with today’s cloud-native application architectures, and most users are looking at point solutions.
The challenge is that without processing the raw data and only the alerts (events), there is significant value being lost. Another approach is to leverage your existing implementation and expertise and do the event correlation in ServiceNow. ServiceNow’s IT Operations Management (ITOM) product line is a newer offering, and thus is still maturing, but there are distinct advantages in having these teams on the same platform.
ServiceNow can also ingest metrics, although this is less mature and less scalable than it needs to be to meet a broader need to analyze raw metric data. This will, of course, get better in time as it’s a major area of investment for ServiceNow. Each monitoring tool has specific context and expertise about a portion of the stack. APM tools understand application frameworks, map application topologies, and provide depth into user experience and performance of applications.
Modern network monitoring tools, such as Kentik, can support data center and cloud networks to visualize topologies, understand internet routing, determine changes in utilization or performance, and provide a complete view across multiple network architectures. Once the data leaves these specialist tools, the context is lost, and when troubleshooting issues arise, the subject matter expert who understands the application, network, or other infrastructure will most often go back to the tools with the right level of visibility to determine the root cause and fix the problems.
Looking to improve your ServiceNow dream team?
Upload your job vacancy and we’ll send you a list of suitable and highly qualified candidates within 48 hours.
Dealing with incidents and major changes are a team sport where team members and leaders contribute across both the technology and business sides to get things back to a productive business-as-usual level. This means collaboration is key when working together on an incident, problem, or change. The use of email and war room conference calls can no longer scale or work properly with complexity today. These break apart quickly and cause larger issues once teams begin to merge, which is commonplace as organizations evolve through a DevOps transformation.
Many DevOps teams use different tools to build and operate their specific services, such as Jira, GitHub, Slack, and Teams, while their service management teams use ServiceNow or other ITSM tools. The information flow is disjointed without significant project work to integrate systems and ensures all workflows are working at any given time. There are several approaches to solving this issue.
Some of the typical workflows that would allow for the passing of a ticket between two organizations may look like these examples, which are all based on real scenarios.
- An issue is detected by monitoring, which then causes an incident ticket to be opened with the root cause as determined by event correlation. During the incident, the policy requires that internal users of a given system be notified of the outage internally so they can communicate with customers. The users and service owners are contained in ServiceNow in the form of an SLA within the application definition, making it easy to link a user-facing ticket to the operational incident and ensuring all teams can understand the current state and when the problem will be fixed.
- The issue is reported by an internal user and, similar to the issue above, we must then notify the operations team that there is a potential problem in the form of an incident ticket. They would then need to organize the application owners (possibly a small team), specialists, and other members to investigate and determine where the problem is via the ticket. During a postmortem, typically you’d determine why the monitoring was not in place to detect the problem and remediate it going forward.
- The final issue type is the worst, which is the same as the previous, but with the issue being reported by an external customer. The workflow would be the same aside from ensuring customers were notified of the problem if it was service impacting. Through a linked change ticket, the remediation would ultimately notify the customer that the issue was resolved. Linking this workflow together to communicate effectively requires a consolidated tool to do this most efficiently.
With these examples, it’s clear there’s a need to manage the handoffs. However, the trend towards smaller, more autonomous teams often causes communication and collaboration challenges at a level we’ve not seen before. This is why we’ve seen such a boon in event correlation and notification management technologies (e.g. PagerDuty, xMatters, OpsGenie, and VictorOps). It’s also why having a single place to work through for both ITOM and ITSM makes a lot of sense. As ServiceNow improves the offering, especially as ITOM and Event Management tooling evolves, we will likely see even more compelling ways these worlds can come together to benefit enterprises.
Interested in a career working with ServiceNow?
The Nelson Frank Tech Academy could help to give you the skills you need.
Jonah Kowall has 17 years of security and operations expertise, as well as leadership within open source, start-up, and enterprises. In 2011 he joined Gartner as an ITOM Research VP, leading Magic Quadrants in APM and NPMD. In 2015 Jonah joined AppDynamics, now Cisco, to drive the company’s partner ecosystem, and corporate development. He then joined Kentik in 2019 as CTO, to set company and product vision and strategy, and execute it by running the product management and product marketing organizations. .