9. Run Functions
Description
The Run functions (formally Detect to Correct) provide a framework for the operation of Digital Products, supporting the running services and systems while assuring that all running services are operating within stated boundaries and in a secure manner. The Run functions also provide a comprehensive overview of the business of digital operations and the services delivered by an Operations team, including security operations. This viewpoint provides an understanding of the inter-relationships among its many domains, and responsiveness to business requests and requirements.
The Run functions bring operations functions together to enhance services and efficiencies – thus reducing risk.
The Run functions contain the following functional components:
-
Support functions:
-
Incident component
-
Problem component
-
Service Level component
-
Knowledge component
-
-
Assure functions:
-
Monitoring component
-
Event component
-
Configuration component
-
Diagnostics & Remediation component
-
The Run functions accommodate the technical inter-relationships and inter-dependencies required to fix operational issues and improve the ability to support business objectives by providing agility, increased uptime, and lower per-service cost.
Related Value Streams
The following value streams use one or more functional components from the Run functions:
-
Evaluate
-
Consume
-
Deploy
Business Benefits
The Run functions enable organizations to increase efficiency, reduce cost, reduce risk, and drive continuous service improvement by defining the data objects and data flow required to integrate the operations of multiple domains.
The key benefits of using the Run functions are:
-
Increase efficiency and reduce cost by:
-
Focusing responses based on causal factor, priority, and business impact
-
Increasing the sharing of information and the reduction of multiple entries of the same data
-
Creating a prescriptive data flow between Event, Incident, Problem, and Change
-
Centralized Event Management for faster analysis
-
Automation between and across business functions
-
Knowledge management and self-service linkage
-
Driving Service Monitoring configuration and predefined Knowledge linked to the Deliver functions
-
Improving the speed at which issues with an Actual Product Instance are identified
-
Driving operating/service level targets
-
Improving the speed at which issues with an Actual Product Instance are proactively identified before the service impact is severe
-
-
Reduce risk by:
-
Sharing consistent data and configuration information between operational silos
-
Prescriptive data flow and data objects
-
Defining business impact
-
Reducing the need for best-guess routing and clannish knowledge
-
Implementing network security to minimize intrusions that cause DoS, viruses, and theft or corruption of data, and to minimize risk exposure
-
Identifying attack signatures that can disrupt operations and affect compliance
-
Clearly defined ownership
-
Increased uptime by reduced MTTR
-
Creating a consistent way of managing service level (SLM) definitions, measurements, KPI calculations, and reporting back to the proper Product Manager or Consumer
-
Performing TVAs
-
Providing an audit trail
-
-
Continuous service improvement:
-
Defined data objects to be shared with Problem Management
-
Using this accumulated Knowledge as input into the Plan functions
-
Improved management information and decision-making
-
The Run functions provide the ability to efficiently manage operations by monitoring key services, correlating and appropriately escalating Events, sharing knowledge, managing (resolving) Incidents and Problems, tracking the Actual Product Instance and its interdependencies, and doing all of that in an automated way. It ensures that functional components used by groups can work together efficiently, through well-defined control points and data objects, to govern and run operations.
9.1. Support Function
The Support function centers around Incident, Problem, and Knowledge Management, as well as tracking service objectives.
9.1.1. Service Level Functional Component
Purpose
The Service Level functional component enables the design and creation of Service Contracts. It is also responsible for the management of all Service Contract data objects throughout their lifecycle, including the governance of the Service Contract instances from the moment they are instantiated. It is responsible for collecting the relevant information in order to monitor compliance with the terms specified in the Service Contract and exposing data that reflects actual performance against the defined Service-Level Agreements (SLAs), SLOs, Operational-Level Agreements (OLAs), and/or Experience-Level Agreements (XLAs).
The actual legal aspects of the Service Contracts are not handled by the Service Level functional component directly; however, these documents (usually created and managed by the legal department and not in IT) are used by the functional components of the Evaluate, Explore, and Integrate value streams as the main input for the demand and requirements definition stages.
The Service Level functional component supports the value streams:
Functional Criteria
The Service Level functional component:
-
Shall be the system of record for the Service Contract
-
Shall manage the Service Contract lifecycle (create, store, and maintain)
-
Shall manage the lifecycle (create, store, and maintain) of KPIs
-
Shall manage the state of the Service Contract
-
Shall manage the relations between the Service Contract and the KPI throughout their lifecycles
-
Shall create reports on the Service Contracts to show the quality of service per SLO
-
Shall create a Service Contract (instance) and start measuring it once a Subscription is instantiated if the Order functional component exists
-
May receive business/IT measurements from the Monitoring functional component if a Monitoring functional component exists
-
May receive measurements such as Incident data as well as other information that may be covered by the Service Contract and used for calculating the KPI measurements
-
May instantiate a Service Contract from a Service Contract (template) originating from the Offer functional component
-
May receive Incident business measurements from the Incident component if an Incident functional component exists
-
May send reporting data on the service level status to the Consumption Experience functional component
9.1.1.1. Service Contract Data Object
Purpose
The Service Contract data object describes the service characteristics and supports service measurement tracking, governance, and audit. Service Contracts can be related to logical services as well as physical services. Service Contracts related to logical services are known as Service Contract templates, while Service Contracts related to physical services are known as Service Contract instances. Each Service Contract data object is comprised of two main parts: the General Contract definitions (aka the header) and the SLOs (the line items), which also enable the nesting of other Service Contracts that define service levels for different aspects of the service. These lines may need to be detailed due to the service being composed of multiple components, because there are multiple providers involved, or to cover different areas of service levels.
Key Attributes
The Service Contract data object shall have the following key data attributes:
-
Id: unique identifier of the Service Contract
-
Name: name of the Service Contract
-
Type: type of the Service Contract (SLA, OLA, UC)
-
Provider: provider of the service
-
Consumer: consumer of the service
-
Start Date: start date/time of the Service Contract
-
End Date: end date/time of the Service Contract
-
Support Calendar: contracted support hours of the service
-
Adherence Calculation Periodicity: service measurement calculation period
-
Maintenance Window: service maintenance timeframes/blackout periods
Key Data Object Relationships
The Service Contract data object shall maintain the following relationships:
-
Incident to Service Contract (n:m): the service level which applies to an Incident is dependent on the related Service Contract(s)
-
Actual Product Instance to Service Contract (1:n): relationships to Actual Product Instance(s) are maintained and updated to ensure component and data object traceability in the value stream
-
Service Contract to KPI (n:m): KPIs will track the measurements associated with Service Contracts; Service Contracts will have multiple KPIs
-
Subscription to Service Contract (1:1): once a Subscription is instantiated, it also triggers the instantiation of a Service Contract instance from the Service Contract template
9.1.1.2. KPI Data Object
Purpose
The KPI data object is the definition of an objective that is measured, its requested thresholds, and the exact mathematical method in which measurement data items are used in order to calculate the KPI measurements.
Key Attributes
The KPI data object shall have the following key data attributes:
-
Id: unique identifier of a KPI
-
Name: name of the KPI
-
Timestamp: time the KPI was measured
-
Metric: the definition of the data collected as input for the metric calculation
-
Algorithm: the exact algorithm for calculating the KPI
Key Data Object Relationships
The KPI data object shall maintain the following relationships:
-
KPI to Service Contract (1:n): associates the measure with a particular contract
9.1.2. Incident Functional Component
Purpose
The Incident functional component facilitates normal service operations restoration as quickly as possible and minimizes the impact on business operations, thus optimizing service quality and availability. Service restoration can be facilitated through the following means:
-
In partnership with the Monitoring functional component, filter end-user Interactions and determine which ones should be associated with Incidents
-
Detect Incidents, investigate the impacts across all domains (server, network, security, etc.), and determine the correct action to take
-
Initiate Change and/or remediation activity for some categories of Incidents
An Incident is defined as an unplanned interruption to a service or a reduction in the quality of a service as defined within the Service Contract related to the Actual Product Instance and underlying systems. Failure of a CI that has not yet affected a service is also an Incident; for example, failure of one disk from a mirror set.
The Incident functional component supports the value streams:
Functional Criteria
The Incident functional component:
-
Shall be the system of record for all Incidents
-
Shall manage the state escalation paths and general lifecycle of the Incident
-
Shall allow an Incident to be initiated from an Event
-
Shall create an Incident when an Interaction cannot be associated with an existing Incident because it requires additional clarification, diagnostics, or support actions
-
Shall create a Problem record when the Incident is severe, requires further deep investigation, or is repeating
-
May trigger the execution of a Runbook (either automated or manual) to provide diagnostics information or remediation steps
-
May trigger the creation of an emergency Change in order to implement a fix to the Incident
-
May provide business measurements of Incident data to the Service Level functional component
-
May receive knowledge from the Knowledge functional component to help diagnose or resolve an Incident
9.1.2.1. Incident Data Object
Purpose
The Incident data object hosts and manages Incident data and the lifecycle of the Incident.
Key Attributes
The Incident data object shall have the following key data attributes:
-
Id: unique identifier of the Incident
-
Title: title of the Incident
-
Category: aids in determining assignment and prioritization
-
Subcategory: second level of categorization, following the Category attribute
-
Status: current stage in the lifecycle of an Incident
-
Status Time: time stamp for the Status attribute
-
Outage Start Time: time stamp for the start of service downtime
-
Outage End Time: time stamp for the end of service downtime
-
Severity: severity of the Incident
-
Priority: priority of fixing the Incident
-
Description: description of the Incident
-
Assigned To: group or person that is assigned to fix the Incident
Key Data Object Relationships
The Incident data object shall maintain the following relationships:
-
Interaction to Incident (1: n): provides the interface for consumers to submit their own Incidents or break-fix requests
-
Incident to Problem (n:m): connection between Incidents that are converted to Problems to permanently address severe/repeating Incidents
-
Incident to Knowledge Item (n:m): connection between Incidents and the Knowledge Items used for their resolution
-
Incident to Runbook (n:m): an Incident is related to one or more Runbooks used to resolve the Incident
-
Incident to Change (1:n): connecting emergency Changes to Incidents for remediation
-
Incident to Actual Product Instance (n:m): Actual Product Instance to which the Incident is associated, and of which it is usually the main subject
-
Incident to Service Contract (n:m): the service level which applies to an Incident is dependent on the related Service Contract(s)
-
Event to Incident (n:m): enables the connection between Incidents and Events and supports the integration lifecycle between them
-
Incident to Incident (n:m): several reported Incidents can be related to the same issue and will then be linked
9.1.3. Problem Functional Component
Purpose
The Problem functional component is responsible for managing the lifecycle of all Problems. The objectives are to solve severe/repeating Incidents, prevent Incidents from happening, and minimize the impact of Incidents that cannot be prevented. The cause of the Problem is not usually known at the time of the Problem data object instance creation, and the Problem functional component is responsible for investigating. It also serves as the main exit point from the Operate value stream for the feedback information to the Evaluate, Explore, and Integrate value streams in the form of Portfolio Backlog Items.
The Problem functional component supports the value streams:
Functional Criteria
The Problem functional component:
-
Shall be the system of record for all Problem records
-
Shall manage the state and lifecycle of the Problem
-
Shall receive Incident information to create a Problem from the Incident functional component when additional diagnostics and root cause need to be determined
-
Shall send Change information to the Change functional component associated to a Problem in order to implement a fix to the issue that is documented
-
Shall push Problem data requiring emergency/specific development to the Defect functional component
-
May send Problems to the Portfolio Backlog functional component to initiate corrective actions
-
May send known error (Knowledge Item) information to the Knowledge functional component
-
May push Problem data to the Diagnostics & Remediation functional component to trigger the execution of a Runbook data object (either automated or manual) to provide diagnostics information or remediation steps
9.1.3.1. Problem Data Object
Purpose
The Problem data object defines a Problem and manages the Problem lifecycle.
Key Attributes
The Problem data object shall have the following key data attributes:
-
Id: unique identifier of the Problem
-
Title: title of the Problem
-
Category: aids in determining assignment and prioritization
-
Subcategory: second level of categorization, following the Category attribute
-
Description: description of the Problem
-
Status: current stage in the lifecycle of a Problem
-
Status Time: time stamp for the Status attribute
-
Assigned To: group or person that is assigned to fix the Problem
Key Data Object Relationships
The Problem data object shall maintain the following relationships:
-
Problem to Change (1:n): enables the relation of a Change record that is created when Problem resolution requires a Change
-
Problem to Portfolio Backlog Item (1:1): ensures a Portfolio Backlog Item is created for Problems requiring a future fundamental/big fix/enhancement to the Digital Product
-
Problem to Defect (1:n): enables the creation of Defects when emergency/specific fixes require development
-
Problem to Knowledge Item (n:m): enables the creation of known error(s) (Knowledge Items) when the root cause of a Problem is identified
-
Problem to Actual Product Instance (n:m): Problem records are mapped to the affected Actual Product Instance(s)
-
Incident to Problem (n:m): connection between Incidents that are converted to Problems to permanently address severe/repeating Incidents
9.1.4. Knowledge Functional Component
Purpose
The Knowledge functional component includes searchable content which can take on multiple formats; for example, general knowledge documents including how to, technical guidance, frequently asked questions, known errors, webinars, videos, training materials, or other information to be managed. The Knowledge functional component contains known errors which describe a known condition/issue which exists for a Product Instance. This functional component increases the contribution to knowledge by providing all users with the ability to generate new content. It improves the accessibility of knowledge in the organization by:
-
Supporting key word search capabilities
-
Providing filter capabilities based on various attributes of the Knowledge functional component, such as subject category, time range, source types (internal versus external), etc.
-
Supporting natural language queries to reduce the complexity of finding relevant information
-
Providing users with access to third-party knowledge and forums
-
Providing natural language processing analytics so, for example, “trending topics” can be reported from service desk interactions
-
Reducing the number of requests for information/knowledge that arrive at the service desk
Service consumers and staff consume third-party knowledge through the same experience as the formal and informal forms of knowledge the company provides.
The Knowledge functional component supports the value streams:
Functional Criteria
The Knowledge functional component:
-
Shall be the system of record for all Knowledge Item records
-
Shall manage the state and lifecycle of the Knowledge Item
-
Shall receive known error information from the Problem to create a Knowledge Item
-
Shall provide functionality to enable the service consumers and staff to rank Knowledge Items, thus improving future knowledge consumption
-
Shall provide knowledge in the form of content that helps to address the needs of both service consumers and digital operations
-
Shall increase the contribution to knowledge by providing all users with the ability to generate new content
-
Shall provide knowledge to digital operations as part of the diagnostics and remediation activities
-
Shall reduce the number of requests for information/knowledge that arrive at the service desk through self-service
-
May aggregate multiple (internal and external) Knowledge Item sources
-
May include searchable content which can be structured IT/supplier-produced articles
-
May share knowledge with consumers via the Consumption Experience engagement portal
-
May improve accessibility of knowledge in the organization by:
-
Supporting key word search capabilities
-
Providing filter capabilities based on various attributes of the Knowledge functional component, such as subject category, time range, source types (internal versus external), etc.
-
Supporting natural language queries to reduce the complexity of finding relevant information
-
Providing users with access to third-party knowledge and forums
-
Providing natural language processing analytics so (for example) “trending topics” can be reported from service desk interactions
-
-
May allow service consumers and staff to consume third-party knowledge through the same experience as the formal and informal forms of knowledge the company provides
9.1.4.1. Knowledge Item Data Object
Purpose
The Knowledge Item data object contains information to be used by both operators and the consuming business as structured searchable content managed and maintained.
Key Attributes
The Knowledge Item data object shall have the following key data attributes:
-
Id: unique identifier of the Knowledge Item
-
Title: title of the Knowledge Item
-
Status: for example, draft, revise, published, review, archived
-
Author Id: unique identifier of the responsible author
-
Publish Date: date/time on which the item is available for publication
-
Expiry Date: date/time on which the item is no longer visible
-
Body: body text, video, or any other content that is published
Key Data Object Relationships
The Knowledge Item data object shall maintain the following relationships:
-
Knowledge Item to Problem (n:m): if a link to a Problem exists, the Knowledge Item will need changing when a Problem is (partly) resolved
-
Knowledge Item to Interaction (n:m): one or more Knowledge Items can be related to an Interaction when a consumer searches for knowledge during a self service experience
-
Knowledge Item to Incident (n:m): one or more Knowledge Items can be related to an Incident as part of the Incident resolution
-
Knowledge Item to Knowledge Item (n:m): knowledge can be linked if related
9.2. Assure Function
The Assure functionality centers around monitoring, configuration tracking, event correlation and diagnostics.
9.2.1. Configuration Functional Component
Purpose
The Configuration functional component is focused on tracking the inventories of Actual Product Instances and the associated relationships of the underlying systems for the specific Digital Product. The Actual Product Instance represents the entire stack of resources and CIs including integrations and dependencies. In addition, this functional component tracks the relationship of an Actual Product Instance to other Digital Products – the Actual Product Instances needed to function, and the relationships between the underlying systems. Its purpose is to identify, control, record, report, audit, and verify service items; including versions, constituent components, their attributes, and relationships.
The Configuration functional component supports the value streams:
Functional Criteria
The Configuration functional component:
-
Shall be the system of record for all Actual Product Instances and their associated relationships
-
Shall manage the lifecycle of the Actual Product Instance
-
Shall create Actual Product Instance(s) and underlying system(s) based on the Desired Product Instance in the Fulfillment Orchestration functional component
-
Shall serve as the data store for the realization of the service in the production environment
-
Shall calculate and provide the change impact based on the proposed Change and the Actual Product Instance relationships
-
Shall calculate and provide the business impact of the Incident to help in the prioritization process
-
Shall calculate and provide the business impact of the Event to help in the prioritization process
-
May be populated by service discovery
9.2.1.1. Actual Product Instance Data Object
Purpose
The Actual Product Instance data object represents the realized deployment of a specific Desired Product Instance. It includes CIs that represent the implemented Actual Product Instance components.
Key Attributes
The Actual Product Instance data object shall have the following key data attributes:
-
Id: unique identifier of the Actual Product Instance
-
Name: name of the Actual Product Instance
-
Type: type of the Actual Product Instance (e.g., infrastructure service, customer-facing service, enabling service, front office, etc.)
-
Configuration Items: model of the configuration of the Product Instance as a set of interconnected CIs
-
Create Time: date/time the Actual Product Instance was created
-
Last Modified Time: date/time the entry was last modified in a significant way
-
Owner: actor for whom this Actual Product Instance was created
-
Location: location of the Actual Product Instance – this can vary from high-level country to city or low-level, such as a building or a room
Key Data Object Relationships
The Actual Product Instance data object shall maintain the following relationships:
-
Desired Product Instance to Actual Product Instance (1:1): each CI should be traceable to the planned configuration described in the Desired Product Instance
-
Problem to Actual Product Instance (n:m): Problem records are mapped to the affected Actual Product Instance(s)
-
Runbook to Actual Product Instance (n:m): Runbook records are mapped to the associated Actual Product Instance(s)
-
Incident to Actual Product Instance (n:m): Actual Product Instance(s) to which the Incident is associated
-
Actual Product Instance to Event (n:m): Actual Product Instance(s) associated with the Event
-
Actual Product Instance to Service Contract (1:n): connection between the Actual Product Instance and the Service Contract in which it is measured
-
Service Monitor to Actual Product Instance (1:n): Actual Product Instance being monitored
-
Actual Product Instance to Actual Product Instance (n:m): an Actual Product Instance can depend on services delivered by other products
9.2.2. Monitoring Functional Component
Purpose
The Monitoring functional component is in charge of creating, running, and managing monitors which measure all aspects/layers of a service such as infrastructure (system and network), application, and security. It is also used to monitor aspects of service usage and Service Contracts. The results of monitoring are often captured in logs which can be referenced by other components. The collected data may trigger an Event. Monitoring can perform anomaly detection or detect certain patterns to predictively detect issues.
It is in charge of storing all measurement results and calculating compound measurements. ML and AI can be used to collect operational data used to predict certain service conditions or service degradation. The creation of the monitor definitions are done earlier in the product lifecycle in the Integrate value stream and are delivered to the Monitoring functional component by the Fulfillment Orchestration functional component. The Monitoring functional component also provides feedback on all aspects of the Digital Product and passes that information to the Evaluate, Explore, and Integrate value streams.
The Monitoring functional component supports the value streams:
Functional Criteria
The Monitoring functional component:
-
Shall be the system of record for all Service Monitors
-
Shall monitor all aspects of an Actual Product Instance
-
Shall store all of the results of the measurements being made on the Actual Product Instance
-
Shall calculate the results of compound Service Monitors from one or more simple measurements
-
Shall manage the lifecycle of the Service Monitor
-
Shall create, run, and manage monitors that measure all aspects/layers of an Actual Product Instance, including:
-
Shall monitor infrastructure (system and network)
-
Shall monitor applications
-
Shall monitor security
-
Shall monitor Product Instances
-
May monitor customer experience
-
May monitor service-level attainment or breach for the Service Level functional component
-
May monitor usage for the Usage functional component
-
May monitor service status for the Consumption Experience functional component
-
-
May receive Service Monitor definitions from the Fulfillment Orchestration functional component
9.2.2.1. Service Monitor Data Object
Purpose
The Service Monitor data object performs the operational measurement aspects of a CI, Actual Product Instance, or its underlying systems in order to understand the current status of the running service. The Service Monitor definition is created in the Integrate value stream as part of the release package, and activated from the Fulfillment Orchestration functional component.
Key Attributes
The Service Monitor data object shall have the following key data attributes:
-
Id: unique identifier for the Service Monitor
-
Name: name of the Service Monitor
-
Description: description of the Service Monitor
-
Type: type of the Service Monitor (system, application, network, security, etc.)
-
Measurement Definitions: definitions of the measurements that the Service Monitor is collecting about the monitored entity (i.e., CI)
-
Last Run Time: date/time that the Service Monitor was last run
-
Last Run Status: the success, or not, of the last run of the Service Monitor
Key Data Object Relationships
The Service Monitor data object shall maintain the following relationships:
-
Service Monitor to Log (1:n): enables the traceability from the log entries that are created to the Service Monitor that defined the collection of data
-
Service Monitor to Actual Product Instance (1:n): the Actual Product Instance is the CI being monitored
9.2.2.2. Log Data Object
Purpose
A Log data object captures information related to CIs in order to better understand performance, health, and/or usage of an Actual Product Instance.
Key Attributes
The Log data object shall have the following key data attributes:
-
Id: unique identifier of a Log record
-
Timestamp: time of the Log event
-
Data: collected Log text
Key Data Object Relationships
The Log data object shall maintain the following relationships:
-
Log to Event (n:m): according to monitoring policies, some Log events are forwarded as service events
-
Log to Service Monitor (n:1): associated Log to the Service Monitor definition that enables the collection of the Log record
9.2.3. Event Functional Component
Purpose
The Event functional component manages Events through the Event lifecycle for Events that occur on any digital service. The Event lifecycle includes but is not limited to detecting, categorizing, filtering, analyzing, correlating, logging, prioritizing, and closing Events. During the Event lifecycle, some categories of Events can serve as initiators of alerts and/or Incidents, and for diagnostics and remediation activities.
The Event functional component supports the value streams:
Functional Criteria
The Event functional component:
-
Shall be the system of record for all Events
-
Shall manage the state and lifecycle of Events
-
Shall manage the correlation between Events
-
Shall categorize Event data
-
Shall receive Event information from Service Monitors from the Monitoring functional component
-
Shall forward Events categorized as Incidents to the Incident functional component
-
May initiate a Change based on Event data in the Change functional component (e.g., trigger for capacity increase)
-
May send automated remediation (Runbook) to the Diagnostics & Remediation functional component
9.2.3.1. Event Data Object
Purpose
The Event data object represents an alert/notification signifying a change of state of a monitored CI or Actual Product Instance.
Key Attributes
The Event data object shall have the following key data attributes:
-
Id: unique identifier of an Event
-
Name: name of an Event
-
Category: category of an Event (info, warning, error, etc.); aids in determining assignment and prioritization
-
Type: categorizing the Event to causal or symptom type
-
Status: current stage in the lifecycle of an Event
-
Status Time: time stamp for the Status attribute
-
Severity: severity of the Event
-
Threshold Definitions: definitions of the thresholds by which the Event severity is determined
-
Assigned To: group or person that is assigned to handle the Event
-
Is Correlated: is the Event correlated to other Events?
Key Data Object Relationships
The Event data object shall maintain the following relationships:
-
Event to Incident (n:m): enables the connection between Incidents and Events and supports the integration lifecycle between them
-
Event to Change (1:n): associated Event is available for Change processing
-
Event to Actual Product Instance (n:m): Actual Product Instance(s) associated with the Event(s)
-
Log to Event (n:m): enables the traceability from the Events that are created to the Log object from which they originated
-
Event to Event (n:m): several events can correlate to the same root cause, and thus they relate to a primary event
9.2.4. Diagnostics & Remediation Functional Component
Purpose
Through the use of manual and automated Runbooks, the Diagnostics & Remediation functional component provides diagnostics information and/or remediation steps to shorten the Mean Time To Repair (MTTR). During the Event lifecycle, some categories of Events can serve as initiators of alerts and/or Incidents and also for diagnostics and remediation activities. ML and AI can be utilized in this function to automate the diagnosis and remediation of Events. Runbooks help to streamline diagnosis and remediation for service functions by applying knowledge solutions to service anomalies.
The Diagnostics & Remediation functional component supports the value streams:
Functional Criteria
The Diagnostics & Remediation functional component:
-
Shall be the system of record for all Runbooks
-
Shall manage the Runbook lifecycle
-
May allow an Event to trigger a Runbook for diagnostics or remediation purposes
-
May allow an Incident to trigger a Runbook for diagnostics or remediation purposes
-
May allow a Problem to trigger a Runbook for diagnostics or remediation purposes
9.2.4.1. Runbook Data Object
Purpose
The Runbook data object is a routine compilation of the procedures and operations which the administrator or operator of the system carries out. A Runbook can be either a manual process or an automated script.
Key Attributes
The Runbook data object shall have the following key data attributes:
-
Id: unique identifier of the Runbook
-
Description: description of the Runbook
-
Category: aids in determining assignment and prioritization
-
Execution Time: date/time the Runbook was last executed
Key Data Object Relationships
The Runbook data object shall maintain the following relationships:
-
Actual Product Instance to Runbook (n:m): track Runbooks and the Actual Product Instance(s) with which they are associated
-
Event to Runbook (n:m): an Event is related to one or more Runbooks used to resolve the Event
-
Incident to Runbook (n:m): an Incident is related to one or more Runbooks used to resolve the Incident