Skip to Main Content

Job Title


Site Reliability Engineer - SC Cleared


Company : Cognizant


Location : London, England


Created : 2025-12-16


Job Type : Full Time


Job Description

Job Description Excellent opportunity for Site Reliability Engineer to be part of our Cloud Infrastructure & Security services practice. Cognizant Infrastructure Services Provides IT infrastructure & Cloud services for clients across industry verticals, including both Consulting/Professional and Managed Services, across Enterprise Computing, Cloud services, Security Services, DevOps, Data Centres, End User Computing, Service Desk, Network Services and Environment Management Services.Candidate should be SC Cleared Key responsibilitiesBuild CI/CD you can trust: Design, implement, and operate pipelines in GitHub Actions and Jenkins that deliver zerotouch, repeatable releases with quality gates, automated tests, and policyascode controls. Containerise services with Docker and standardise build images.Provision everything as code: Model cloud resources using Terraform (workspaces, modules, registries, drift detection), enabling composable, reviewed changes across environments.Run scalable compute: Stand up and operate container platforms Kubernetes (incl. EKS, AKS, GKE), ECS, and Azure Container Instances (ACI) including cluster lifecycle, node pools, autoscaling, ingress, service mesh, secrets, and backup/restore.Observability : Instrument services and infra with New Relic, Grafana (incl. Loki/Tempo where applicable) and cloudnative telemetry. Define SLIs/SLOs, build actionable dashboards, alerts, and runbooks that drive fast MTTR.Engineer for reliability & cost: Apply SRE practices (error budgets, change management, resilience testing), rightsize resources, and use cloud provider tooling for security/cost posture.Incident response & oncall: Participate in a fair, documented oncall rota; lead and/or contribute to incident handling, comms, postincident reviews, and corrective actions.Security & compliance by design: Embed IAM leastprivilege, secrets management, image/provenance scanning, and guardrails into pipelines and Terraform modules.Key Skills and Experience:Proven experience operating production systems on a major cloud (AWS/Azure/GCP) with solid cloud fundamentals (networking, IAM, storage, compute, HA/DR).Handson IaC with Terraform (modules, state, CI validation, policy checks).Strong CI/CD skills in GitHub Actions and/or Jenkins (runners/agents, reusable workflows, secrets, matrix builds, artefact management).Containers & orchestration: Kubernetes administration knowledge (controllers, scheduling, ingress, autoscaling, troubleshooting) and experience with EKS/AKS/GKE and/or ECS/ACI.Observability: Practical use of New Relic and Grafana to define metrics/traces/logs, tune alerts, and drive SLOs.Scripting & automation: Proficiency in Python and Bash; experience with boto3 or equivalent SDKs.Incident management: Exposure to production incidents, oncall participation, and postincident review practices.Clear communication, stakeholder partnership, and a bias to automate, document, and simplify.