Job Description
Key accountabilities/functions
-As the authoritative subject matter expert, responsible for daily troubleshooting, support and maintenance of enterprise monitoring systems
-Performs complex systems design, implementation and integration functions
-Manage risk identification and mitigation challenges via the development and delivery of system enhancements, updates and upgrades to the monitoring systems
-Developing rigorous testing procedures for software changes and upgrades
-Regularly review the use of the monitoring system with key stakeholders, ensuring that improvement opportunities are identified and implemented to improve service delivery and business outcomes
-Responsible for the development of relevant diagrams and documenting of monitoring system solutions and configurations
-Collaborate with technology and business teams to gather enterprise monitoring-related requirements and design, deliver high quality, innovative and effective solutions
-Interest and willingness to cross train across variety of monitoring tools used within the organisation
-Develop the long-term enterprise monitoring road map
-Educate and mentor the IT Organisation/team members on monitoring best practices and trends
Qualifications
<[if>· <[endif]-->Bachelor's Degree in Computer Science or equivalent educational and/or work experience
<[if>· <[endif]-->Certifications in RedHat Linux, AIX, Windows, and/or ESX is desirable
<[if>· <[endif]-->Certifications in monitoring technologies is desirable
Knowledge & Experience·
- Extensive experience with monitoring methodologies (remote and agent-based), and toolsets
- Extensive experience deploying and administering HP monitoring software within large carrier networks (Operations Manager i, Network Node Manager, Network Automation)
- Experience with design, implementation and support of monitoring tools in a complex, multi-platform environment
- <[endif]-->Administration experience with Linux and Microsoft Windows server operating systems.
- <[endif]-->Ability to diagnose and troubleshoot infrastructure and application problems in a complex environment
- <[endif]-->Ability to script or code in at least one language (JavaScript/Python/Perl/shell/PowerShell)
- <[endif]-->Ability to provide level-of-effort estimates and complete deliverables within allotted time
- <[endif]-->Demonstrated ability to perform root cause analysis and implement tools for monitoring
- <[endif]-->Experience with formal change management processes, ITIL Certification and ITSM is preferred
- <[endif]-->Good technical understanding of a Microsoft architecture, tools and related products supporting applications and systems