We are looking for a Manager, Site Reliability Engineeringto lead the team of cloud platform engineers building and supporting infrastructure backing critical Lightspeed services. The platform covers the full cycle of software delivery, from CI/CD pipelines to high-availability scalable production environments.
Role:
- Highly autonomous role responsible for the team’s overall direction and execution
- Own the full scope of production frameworks, tools and infrastructure for delivering and running services in production environments on multiple clouds (GCP, AWS)
- Define team’s vision, rationalize and prioritize projects with the emphasis on improving developer experience, production stability and scalability.
- Build the team’s roadmap, establish the processes to execute on it, and keep the team on track.
- Hands on and highly technical. Set the technical direction for the team, guide the process of selection and evolution of technologies and tools.
- Empower and grow the team of 5-10 members
- Work closely with multiple development teams to understand their pain points and how to unlock more value and productivity
- Lead the team to design, build and maintain robust infrastructure built upon GCP and AWS, leveraging cloud native technologies such as Terraform, GKE, Cloud SQL, BigQuery, etc.
- Improve, simplify and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, CircleCI, Jenkins etc.).
- Participate in the incident management process and conduct post-mortem analysis to prevent future outages.
- Manage infrastructure change through infrastructure as code (IaC)
- Be part of our on-call rotation.
And a little bit of....
- Take initiative to identify broader opportunities to improve the platform and processes across the company
- Contribute to the team’s objectives hands-on as needed.
What will make you successful:
- The team has a clear vision and the roadmap that is aligned to it
- The team delivers on the roadmap milestones
- KPI for developer productivity is established, tracked, and the team drives improvements in productivity
- Stability KPI: the number of incidents related to production outages
- Scalability KPI: the platform handles traffic growth and spikes without going down
- Costs KPI: the platform is optimized for cost savings, high utilization. Costs are monitored and spikes are alerted on.
Experience:
- A Bachelor’s degree in Computer Science, Engineering, or equivalent practical expertise serves as a foundational knowledge base.
- Demonstrated proficiency in effectively overseeing production environments.
- Extensive hands-on experience with GCP and AWS
- Proven expertise in orchestrating and managing infrastructure through code, streamlining operations and promoting automation.
- Built and managed CI/CD pipelines to streamline software development processes.
- Led small to medium-sized teams of platform infrastructure engineers
- Ability to collaborate closely with stakeholders across different teams and disciplines
Skills you will bring to the team:
- Self-starter, able to set the direction for the team
- Dealing with ambiguity, able to structure complex problem space and turn it into executable roadmap
- Strong ownership, taking full responsibility for the team’s scope and successful execution
- Strong execution, ability to structure the processes to get things done
- Quick learner, able to dig deep and understand new technologies and frameworks
- Ability to get the team excited about the vision, empower and coach
- Solid collaboration and communication skills
- Deep understanding and hands-on experience building scalable software delivery infrastructure on GCP and AWS
What’s in it for you:
- Join a growing team and help us move to the next level
- Amazing benefits & perks, including equity for all Lightspeeders
- Constant development of both your skill-set and business acumen with limitless growth opportunities
- Lots of autonomy, flexible work culture
- Innovation time to explore and learn at work
- Shaping the company by joining cultural & technical committees
- Opportunity to join a fast-paced, high-growth company
- Opportunity to learn, expand your skill set, forge wonderful relationships and make your mark within the diverse and inclusive Lightspeed family, a true Canadian tech success story
…. And enjoy a range of benefits that will keep you happy, healthy and (not) hungry.
- Lightspeed equity scheme (we are all owners).
- Flexible paid time off and remote work policies.
- Health insurance.
- Contributions to your pension plan - RRSP.
- Health and wellness benefit of $500 per year.
- Paid leave and assistance for new parents.
- Mental health online platform and counselling & coaching services.
- Training opportunities to grow your skills and career
- Volunteer day.
- Fully stacked kitchen (hot and cold beverages, meals served)
- Happy hours to build your relationships with colleagues after work
#LI-PR1