Incident Management

Incident management is the process of restoring normal service operation as quickly as possible and therefore minimizing the adverse impact on service levels committed to by the organization to its customers. ‘Normal service levels’ definitions are defined in Service Level Agreements.

Recording of incidents

Each and every incident should be recorded. Incident generation as well as recording could be handled manually (e.g. customers calling a helpdesk) or fully automated (e.g. a monitoring system automatically feeding an incident into the incident/ticketing management system). The method chosen should be fit for purpose and adequate for the business.

Each recorded incident should have at least the following information:

  1. Unique incident identifier (a.k.a. incident ticket number).
  2. Date and time reported.
  3. Originator (e.g. user, event management system).
  4. Contact and location details of originator if reported by individuals.
  5. Incident owner.
  6. Data center category (e.g. electrical, mechanical) and subcategory (e.g. UPS, air-handler).
  7. (Short) description to include (estimated) impact.
  8. Initial classification based on severity of impact and urgency.
  9. Level of priority.
  10. Status of incident.
  11. Resolution (e.g. workaround, solution).
  12. Closure of incident.

 

Incident categorization