Role mission
As a Site Reliability Engineer (SRE) serving clients across multiple industries (including edtech, telecommunications, and more), you will work with cutting-edge technologies like AWS, ECS/EKS, and event-based systems, ensuring the reliability, scalability, and performance of our services. If you are passionate about solving complex challenges and making an impact through innovation and collaboration, we would love to hear from you.
Deliverables
Design, deploy, and maintain infrastructure using CDK in AWS environments
Develop monitoring solutions and implement incident response processes to ensure high availability and reliability
Implement and manage containerized applications using ECS/EKS
Support various databases (RDBMS, NoSQL) ensuring optimal performance and reliability
Serve as an architect/AWS SME, lending your expertise to devs as they design scalable solutions
Work closely with development teams to ensure best practices in reliability and performance are followed
Write scripts to automate processes and improve efficiency
Perform DevOps tasks such as CI/CD pipeline management and configuration management
About you
Proficient in TypeScript
Proven experience in an SRE role or similar, with hands-on expertise in CDK or Terraform, and AWS
Experience managing containers using ECS/EKS in Fargate and EC2 clusters
Knowledge of various databases (RDBMS, NoSQL) and their performance tuning
Experience with event-based systems and event-sourcing methodologies
Experience with CI/CD pipelines and configuration management tools
Strong analytical and troubleshooting skills with a proactive approach to problem-solving
Excellent communication skills with the ability to collaborate effectively across teams
Nice to have
Experience in either EdTech and Telco (preferably both)
#J-18808-Ljbffr