Incident and Problem Manager

  • Full Time
  • Gauteng
  • Applications have closed

Website Teraco

Teraco is the first provider of resilient, vendor neutral data environments in South Africa. Clients benefit from the cost savings and improved resilience of securely housing their information systems and networking equipment in a colocation facility purpose-built and operated to global best practic… read moree by an expert organisation with an absolute focus on data centre technology and infrastructure. Founded in 2008, Teraco Data Environments has brought international best practice in vendor neutral data centre management to South Africa to give businesses a technically superior, physically safer and lower cost environment for their information systems. Teraco operates facilities in Rondebosch – Cape Town (CT1), Riverhorse Valley – Durban (DB1) and Isando – Johannesburg (JB1) . Teraco data centres are located on the fibre rings of the major licensed carriers in South Africa, including Telkom, Neotel, Broadband Infraco and Dark Fibre Africa all of which have fibre nodes within the facilities. Teraco Campus Isando Johannesburg

MAIN FUNCTIONS OF THE JOB

Problem Management:

Analysing incidents to identify recurring patterns
Conduct root cause analysis to understand the underlying causes of problems. 
Developing and implementing corrective actions to address root causes and eliminate future incidents. 
Working with relevant teams to implement solutions and updates to prevent similar problems. 
Ensure response teams are coordinated and effective in investigating and resolving major complex problems. (Responsible team will assume incident management responsibility for a given event)
Collaborate with subject matter experts to resolve complex problems & track problem lifecycle from identification to resolution. 
Track tickets for all corrective actions and validate that the corrective actions are implemented as required.  
Maintain a problem knowledge base and documentation to share learnings across the organization to facilitate quicker resolution of similar incidents in the future
Manage problem resolution bridges, provide timely and clear updates to stakeholders, and document critical action items to drive resolutions.
Own and lead a structured Root Cause Analysis (RCA) process to resolve major incidents and problems. 
Facilitate root cause and corrective action plan meetings, after the implementation of the correction. Ensure the responsible managers, documenting incident details and post-incident analysis to learn from events, and that incident reports reflect all root causes, corrections and corrective actions. 
Drive teams to document and submit incident reports within OLA and SLA
Signatory on all incident reports across the business. 
In collaboration with the Client Experience Manager, identify improved reporting formats and templates. Drive consistency across Teraco’s operational organisation. 
Review incident response plans and procedures and identify improvement opportunities using data and metrics

Incident and Problem Management Framework:

Implement a clear and concise Incident and Problem Management framework to ensure incidents are handled in line with established policies and procedures, and to increase efficiency of incident response
Establish various root cause analysis techniques to identify the root causes and coach leadership in effective root cause analysis where required to drive a culture of effective root cause analysis.
Ensure communication plans are in place and ready for activation during major incidents
Create communication and escalation framework to ensure stakeholders are kept up to date about the incident status and impact. DCO staff will assume incident management responsibility for a given incident & Facilitate communication during incidents to ensure coordinated response.
Collaborate with the Client Experience Manager on client impacting incidents, to ensure client’s interests are central to Teraco’s response to incidents, and that there is effective communication with clients. 

QUALIFICATIONS AND EXPERIENCE

Bachelor’s degree in a relevant field (e.g., IT, Engineering, Business Management, or similar) preferred, or equivalent experience
Certifications (highly beneficial):
ITIL v3/v4 Foundation or Intermediate Level
RCA/Problem Solving training (e.g., Kepner-Tregoe, Six Sigma Yellow/Green Belt)
ISO standards familiarity (especially ISO 27001, 50001 or ISO 9001)
5+ years in incident and/or problem management roles, ideally within data centre, critical machinery and/or electrical infrastructure or similar high-availability environments
Experience in managing major incidents and leading post-mortems
Proven track record of implementing effective corrective and preventive action plans
Familiarity with operational workflows in critical facilities (e.g., infrastructure systems, networks)
Experience collaborating with client-facing and technical teams
Background in managing communication during major service disruptions
Experience working within Root Cause and Corrective Action frameworks

Apply via company website ( http://www.teraco.co.za ) or

teraco.mcidirecthire.com

 

More posts