About the Role
As a DevOps / SRE Engineer at CaeliCode, you'll build the delivery and reliability infrastructure that keeps our clients' systems running smoothly. You'll design CI/CD pipelines, implement GitOps workflows, and create observability systems that give teams confidence in every deploy.
You'll work across multiple client environments, which means exposure to a wide variety of architectures, toolchains, and challenges. No two weeks look the same, and that's by design.
What You'll Do
- Design and implement CI/CD pipelines using GitHub Actions, GitLab CI, or Jenkins
- Build GitOps delivery workflows with ArgoCD, Flux, or similar tools
- Set up comprehensive monitoring and observability stacks (Prometheus, Grafana, ELK, Datadog)
- Define and track SLOs/SLIs to measure and improve service reliability
- Automate incident response runbooks and on-call processes
- Implement infrastructure-as-code practices and help clients adopt platform engineering
- Conduct reliability reviews and capacity planning for production systems
- Reduce operational toil through automation, self-healing systems, and better tooling
What We're Looking For
- 3+ years of experience in DevOps, SRE, or platform engineering roles
- Deep hands-on experience with CI/CD pipeline design and implementation
- Strong knowledge of containerization (Docker) and orchestration (Kubernetes)
- Experience with monitoring and observability tools (Prometheus, Grafana, PagerDuty)
- Proficiency in at least one scripting/programming language (Go, Python, or Bash)
- Understanding of SRE principles: SLOs, error budgets, toil reduction, incident management
- Experience with Infrastructure as Code (Terraform, Ansible, or Pulumi)
Nice to Have
- Experience with GitOps tools (ArgoCD, Flux)
- Knowledge of chaos engineering practices (Gremlin, Litmus)
- Familiarity with platform engineering and internal developer platforms
- Cloud certifications (AWS DevOps Professional, CKA, etc.)
- Experience building custom Kubernetes operators or controllers