D
Databricks
DevopsSenior
Sr Platform Monitoring Engineer
SreDevopsAWSAzureGCPDockerKubernetesELK StackPrometheusGrafanaPagerdutyPython
About the Position
The role involves empowering data teams and building an advanced data and AI infrastructure platform. As a Senior Platform Monitoring Engineer, you'll investigate platform incidents and enhance observability and customer experience through various technical solutions.
Responsibilities
- Lead platform incident investigation, coordinating cross-functional teams through detection, mitigation, and resolution.
- Conduct thorough post-incident root cause analysis to identify systemic patterns.
- Design and implement customer-focused alerting pipelines and observability workflows.
- Build automation tools and resolve reliability gaps.
Requirements
- Minimum of 5 years of experience as an SRE, DevOps Engineer, or similar role.
- Production-level experience with at least one major cloud provider (AWS, Azure, GCP).
- Proficiency in Docker and Kubernetes.
- Hands-on experience with ELK, Prometheus, Grafana, PagerDuty.
- Strong proficiency in Python or similar languages.
- Experience owning critical phases of the incident lifecycle in production environments.
- BS, Master's, or PhD in Computer Science or related field.
Sr Platform Monitoring Engineer