We are looking for an SRE, experienced in distributed systems, Kubernetes & microservices to join our Applications team. The team focuses on providing tooling to enrich the core Hazelcast Platform, making it easier to use, scale and provide greater functionality. Ensuring solutions to meet the most demanding customer needs.
Day to day, you’ll be leveraging your solid engineering fundamentals with a focus on performance, consistency, resilience and scale, bringing your passion for solving difficult problems to help realize the product vision.
Your role as a SRE is crucial in ensuring that Hazelcast Platform meets business objectives, is robust and scalable, and is depended upon by customers for mission-critical implementations.
WHAT YOU’LL DO
Keep Hazelcast cloud-based production systems running smoothly 24/7/365
- Design and Development:
- Design, develop, and maintain our cloud infrastructure to support both our end user management center and microservice based platform
- Implement new solutions using AWS and terraform, improving scalability, throughput, and reliability.
- Support and manage our Keycloak IDP ensuring it provides appropriate security while meeting the needs of the development team
- Security and Integration:- Implement security measures to protect data integrity and confidentiality, including encryption, access control, and compliance with relevant regulations.
- Work with our operations team to maintain our SOC2 & ISO27001 compliance, and keeping our environment secure
- Monitoring and Maintenance:- Monitor the system for performance issues, errors, and potential failures, and implement maintenance procedures such as backups, data recovery, and disaster recovery plans.
- Troubleshoot issues related to data storage, including performance bottlenecks, data corruption, or compatibility issues with other software components.
- Collaboration:- Collaborate with cross-functional teams, including software developers, architects, and product managers, to ensure the effective integration and operation of the components within the overall software infrastructure.
- Document design decisions, implementation details, and operational procedures to facilitate collaboration among team members and ensure the maintainability of the system.
- Continuous Learning:- Stay updated with the latest developments in storage technologies, Java programming language, and software engineering best practices, and apply this knowledge to improve existing storage systems and develop new solutions.
- On-call participation- Be part of our on-call rotation to respond to availability incidents and work with support and engineers on customer incidents
WHAT YOU HAVE
Experience of distributed systems, Kubernetes & microservices
- Infrastructure as Code (Terraform)
- Modern devops stack (K8s, Prometheus, Grafana, Opentelemetry, ArgoCD, helm)
- Experience with at least one programming languages, preferably Golang or Python
- Experience with CI and building CD pipelines (Jenkins, GitHub Actions)
- A passion for automation and keeping our software delivery fast and efficient
- Knowledge of following are desirable:- Mutli-cloud (AWS, GCP and/or Azure)
- Experience working with software engineers in designing cloud-native applications or troubleshooting them
- Experience as part of an on-call rota
- Bachelor's degree in a relevant field of study (Computer Science, or related discipline). OR equivalent experience.
BENEFITS
- 25 days annual leave + Bank holidays
- Group Company Pension Plan
- Private Medical Insurance
- Private Dental Insurance
- Life Insurance
- EAP (Employee Assistance Program)