Site Reliability Engineering Manager

Job Title: Site Reliability Engineering Manager
Contract Type: Permanent
Location: Melbourne CBD
Salary: $160000 - $180000.00 per annum
Start Date: 2020-02-25
Reference: JO-2002-13749
Contact Name: Ged Wilson
Contact Email: [email protected]
Job Published: February 25, 2020 17:45

Job Description

Site Reliability Engineering Manager - $175k to $190k base + super (+Share Options)

The Company:

This is an exiting time to join a growing company. They are very unique in what they do and have a very strong track record in growth whilst disrupting the market with their innovative software. 

About the Role 
As a Site Reliability Engineering Manager (SRM) you will own the end-to-end availability and performance of my clients services. You'll also lead by example, develop your team and establish credibility with the quality of your team's technical execution.

On the SRE team, you'll build solutions to enhance availability, performance and stability of my clients products, as well as automating away repetitive work. You'll also respond to pings, pages, and alerts to investigate and dive into issues in the platform.

You'll be working on non-production and production environments, monitoring, data collection, configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives, and platform automation. The best person for this role is someone that has a collaborative spirit - in a world, it's not about being a hero and having all the answers, it's about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption.

You'll be strategically minded, thinking about best practice, industry standards, continuous improvement and better ways for us to achieve our goals. The team needs someone who can ask questions, learn from others and turn chaos into order.

What you'll have
  • Experience leading a team of Software/Systems Engineers;
  • Software development experience with C#
  • Automation experience - ideally in Python or PowerShell
  • Manage end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence;
  • Lead by example, care for the team, and establish credibility with the quality of the teams' technical execution;
  • Manage on-call rotations across continents, using a follow-the-sun model;
  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of RecordPoint's services;
  • Understanding of incident management process;
  • Experience in monitoring distributed systems;  
Desirable experience
  • Experience with container management and micro-services architectures such as Docker and Kubernetes
  • Metrics, monitoring and logging software such as AppInsights, Graphana, Prometheus, statsd and Datadog
  • Experience with infrastructure as code - ideally Terraform
  • Familiarity with other programming languages and frameworks, such as Python and Javascript(Node.js)
You will need to have Full Australian Working Rights to be Eleigible for this position. 
Contact Ged Wilson for more information. 

Get similar jobs like these by email

By submitting your details you agree to our T&C's