The Dayforce Workforce Management (WFM) Engineering team is seeking a Release Engineer with strong Site Reliability Engineering (SRE) expertise to support and evolve our cloud-native Workforce Management platform.
About the opportunity
The Dayforce Workforce Management (WFM) Engineering team is seeking a Release Engineer with strong Site Reliability Engineering (SRE) expertise to support and evolve our cloud-native Workforce Management platform. Our systems power mission-critical scheduling, time, attendance, and labor optimization services used by customers globally.
In this hybrid Release Engineering / SRE role, you will drive reliable, scalable, and secure software delivery while also improving system resiliency, observability, performance, and operational excellence. You will help design and maintain CI/CD pipelines, infrastructure-as-code frameworks, and reliability standards that ensure high availability and predictable deployments across distributed microservices.
You will partner closely with WFM developers, platform, Cloud, Security, and Operations to improve service reliability, define SLOs/SLIs, reduce operational toil, and strengthen production readiness practices.
Design, implement, and maintain CI/CD pipelines supporting WFM Services (API and Consumers)
Take responsibility for services deployments and manage the complex multi versions in multiple regions.
Monitor and maintenance on all services and make sure services are all green.
Automate infrastructure provisioning and configuration using Infrastructure-as-Code (IaC)
Troubleshoot production incidents and participate in root cause analysis (RCA)
Partner with development teams to improve deployment strategies
3+ years of experience in DevOps, Release Engineering, SRE, or Cloud Infrastructure roles
Experience integrating AI-powered services into enterprise development tools to achieve AI-accelerated engineering (e.g., GitHub Copilot, Codeium, Sourcegraph Cody, etc.)
Experience integrating AI-driven observability tools) for proactive monitoring at scale, improving system stability, accelerating incident response, and automating root cause analysis
Strong experience with Microsoft Azure cloud technologies (Azure, Application Insights, ADO)
Hands-on experience implementing Infrastructure-as-Code using Helm charts, Terraform, or similar tools
Experience operating Kubernetes clusters in production (AKS preferred)
Strong scripting skills (PowerShell, Bash, YAML, Python, or similar)
Experience in managing Redis / Kafka topic / SQL / Non-SQL DB deployment and management
Experience with observability tools such as Application Insights, Prometheus, Grafana, or similar
Experience supporting high-availability distributed systems
Strong understanding of networking fundamentals (DNS, HTTP/S, load balancing, firewall rules)
Experience performing incident response and root cause analysis
Experience working in Agile/Scrum environments