Senior Site Reliability Engineer

Jobgether

April 24, 2025

11:34 PM

Jobgether

Company size : 11-50 employees
Job Type & Title : Remote
Apply Now
About Company:
The hiring process is broken. Candidates waste time applying for jobs that don’t match, and companies get buried in irrelevant applications. At Jobgether, we’re changing the game. Our AI-powered platform connects top talent with the right opportunities—fast. Instead of receiving hundreds of random applications, companies get introduced to the top 5–10 candidates who truly fit the role. Our technology understands more than just keywords—it interprets skills, experience, and intent to create meaningful connections. We eliminate the noise, automate the manual work, and let people focus on what really matters: hiring smarter and finding the perfect job, no matter where you are.
Skills:

As a Senior Site Reliability Engineer (SRE), you will play a key role in scaling, securing, and improving the cloud infrastructure of the organization. Your primary focus will be to ensure the reliability and scalability of systems by implementing proactive solutions and automating infrastructure management. You'll work closely with engineering and platform teams to enhance the reliability of services, manage Kubernetes clusters, and optimize cloud resources. You will also be responsible for leading incident response, conducting post-incident reviews, and refining best practices to continuously improve the system's performance and security.

Accountabilities:

Own initiatives related to system reliability and scalability, identifying potential issues and implementing proactive solutions to prevent them
Participate in on-call rotations, responding to incidents, performing root cause analysis, and driving long-term fixes
Design, deploy, and manage Kubernetes clusters, utilizing tools like Helm charts, Cilium, and Karpenter to optimize both performance and cost
Architect and maintain AWS infrastructure, focusing on RDS/Aurora PostgreSQL, networking, and scaling best practices
Automate infrastructure provisioning using tools like Crossplane and Terraform to maintain consistency and scalability
Enhance observability by improving monitoring systems using Datadog and drive proactive detection and resolution of system issues
Conduct post-incident reviews and document lessons learned, driving improvements into long-term system practices

Requirements

Minimum of 5 years of experience in SRE, DevOps, or Infrastructure Engineering, demonstrating strong ownership and problem-solving skills
Proficiency in Kubernetes, Helm, and networking security practices
In-depth experience with AWS services such as RDS, Aurora, VPC, EKS, EC2, and IAM
Expertise in PostgreSQL administration, including performance tuning and high availability management within AWS
Familiarity with CI/CD tools like GitHub Actions and ArgoCD, with a focus on automation and security best practices
Strong understanding and experience in Infrastructure as Code (IaC) using Crossplane and Terraform
Experience in observability and monitoring with Datadog
Proficiency in Python and Bash scripting for system automation and management
Strong communication skills and the ability to collaborate effectively across engineering teams and document processes in Confluence

India Abroad® and New India Abroad®, publications of Indian Star LLC, are primarily meant to keep the Global Indian Diaspora informed about what is happening in India, the world and in their own neighborhoods through digital platforms, and print publications.