361 Sre Professional jobs in Egypt
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Egypt
• Bulgaria
• Greece
Information Technology
Hybrid
Experienced Professionals
Department: Digital Factory, Digital & Technology Platform Services.
We are seeking a Site Reliability Engineer (SRE) to join our Integration Factory team. This role is pivotal in ensuring the reliability, scalability, and performance of our integration platforms and services. You will work at the intersection of software engineering and operations, focusing on performance & availability, automation, observability, and continuous improvement of our integration services (e.g. problem management, reduction of user created incidents, reduce MTTR to 48 hours or less).
YOUR KEY RESPONSIBILITIES:
- Maintain and enhance the reliability and availability of integration platforms (e.g., API gateways, message brokers, ETL pipelines).
- Design and implement monitoring, logging, alerting, and observability to ensure system health and performance.
- Contribute to integration design by defining monitoring and end-to-end observability requirements.
- Automate deployment, scaling, and recovery processes using Infrastructure as Code (IaC) and CI/CD pipelines.
- Collaborate with API & Event consumers, integration product manager, integration development and integration architects to ensure best practices and continuous improvement in system design and deployment (e.g. feature prioritization).
- Troubleshoot and resolve incidents in production environments, performing root cause analysis and postmortems.
- Define and track Performance & Availability, Service Level & Operating Level Agreements (SLA, OLA), Mean-Time-To-Resolve (MTTR) and customer and peer satisfaction (NPS, P4G).
- Continuously improve system resilience, fault tolerance, and recovery strategies.
- Work closely with the integration support team to ensure accurate reporting and effective incident handling
- Work along with the automated testing and observability teams to ensure and validate monitoring points effectively detect and report issues.
- Responsible for determining the creation of dashboards in observability platform.
ARE THESE YOUR SECRET INGREDIENTS?
- Required:
o A passion for creating robust, scalable platforms that accelerate innovation.
o Bachelor's degree in computer science, engineering, a related technical field, or equivalent practical experience.
o 5+ years experience in Site Reliability Engineering, DevOps and / or similar roles (e.g. Level 2 and 3 Operations Engineer, Integration Development).
o Strong understanding of core integration design principles and patterns (REST, GraphQL), authentication methods (OAuth, API Keys), and data formats (JSON, XML).
o Proficiency in scripting and automation (e.g., Python, Bash, Terraform, and Ansible).
o Experience with cloud platforms (e.g. Azure).
o Familiarity with monitoring and observability tools (e.g. Dynatrace).
o Solid understanding of CI/CD pipelines (e.g. Azure DevOps), containerization (Docker), and orchestration (Kubernetes).
o Exceptional communication and influencing skills, with a demonstrated ability to lead by consensus and drive standardization across multiple teams.
o Strong analytical and problem-solving skills, with a data-driven approach to decision-making.
o Good proficiency in English as a day-to-day business language is a must
- Preferred:
o Previous experience as a software developer, solutions architect, or in a similar technical role.
o Hands-on experience with enterprise integration platforms and technologies.
o Specific experience with SAP integration tools such as SAP BTP Integration Suite, SAP BTP API-M, OData services, APIs, iDocs and RFCs.
o Familiarity with event-driven architecture and streaming platforms like Apache Kafka.
o Experience with cloud-based API management services, particularly Azure API Management.
o Experience with agile development methodologies (e.g., SAFe, Scrum, Kanban).
o Familiarity with DevOps practices and tools.
o Experience with Azure, D365, SAP S/4HANA and SAP MDG. SAP and MS Azure Certifications are a plus.
o Excellent leadership and communication skills.
o Strong problem-solving and decision-making abilities.
o Strong organizational and time-management skills.
o Ability to work in a fast-paced and dynamic environment.
o FMCG background or experience working with FMCG
ABOUT YOUR NEW TEAM:
We are Coca-Cola Hellenic, a growth-focused consumer goods business and strategic bottling partner of the Coca-Cola Company. We bottle, distribute and sell an unrivalled range of products in 29 markets in Europe, Africa and Eurasia. As we do, we create value for all stakeholders, support socio-economic growth and build a more positive environmental impact.
We bring together more than 30,000 people from over 70 nationalities, coming from five continents. The diversity of our markets, from mature to emerging economies, provides a wide range of attractive opportunities for growth.
We nurture our talents. We give opportunities to people across all functions and levels, as well as different geographies, backgrounds and education. We are willing to take a risk on the people we believe in, even if they don't have the perfect experience. We have faith in what every person can be.
And although we have so much to be proud of, we always stay humble. We believe the real magic happens – for us and for you – when we OPEN UP.
AT COCA-COLA HBC, DIVERSITY HELPS US THRIVE
At Coca-Cola HBC, we are an inclusive employer that thrives on diversity. This means our environment provides equal opportunities for all, regardless of race, color, religion, age, disability, sexual orientation, or gender identity. Join us in nurturing a culture where everyone belongs and contributes to our collective success.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Role Overview
A
Senior DevOps / Site Reliability Engineer (SRE)
to design, build, and maintain scalable infrastructure systems. The ideal candidate has deep expertise in Linux administration, container orchestration, CI/CD, and modern DevOps practices, with the ability to mentor junior team members and drive automation across environments.
Key Responsibilities
- Lead the design, deployment, and administration of Linux-based infrastructure.
- Architect, maintain, and optimize CI/CD pipelines for development and production workloads.
- Build and manage containerized workloads using Docker and Kubernetes (including HA setups, storage, and networking).
- Troubleshoot complex system, networking, and DNS-related issues across distributed systems.
- Implement monitoring, logging, and alerting solutions to ensure system reliability and performance.
- Automate operational tasks using scripting and Infrastructure as Code (Terraform, Ansible, etc.).
- Collaborate with development and security teams to ensure best practices in system reliability and compliance.
- Mentor and guide junior engineers in Linux administration, DevOps, and automation best practices.
Required Skills & Knowledge
- 5+ years of experience in Linux systems engineering/DevOps.
- Strong expertise in Linux administration (performance tuning, kernel-level debugging, storage systems).
- Advanced understanding of networking, DNS, load balancing, and firewalls.
- Proven experience managing Docker and Kubernetes clusters (including upgrades, scaling, and troubleshooting).
- Hands-on experience with CI/CD tools (Jenkins, GitLab CI, ArgoCD, etc.).
- Strong automation skills (Bash, Python, Ansible, Terraform).
- Knowledge of security best practices in systems, containers, and networks.
- Ability to design resilient, highly available infrastructure systems.
- Deep expertise in administration, tuning, backup, recovery, clustering and replica sets for relational databases (MySQL, PostgreSQL and MongoDB)
Oracle Cloud Infrastructure
Posted today
Job Viewed
Job Description
Oracle Cloud Infrastructure (OCI) Architect
Remote from Egypt - Travel to KSA as and when required
6-month initial contract - with extensions
Are you a cloud-savvy architect ready to design cutting-edge solutions on Oracle Cloud Infrastructure (OCI)? I'm looking for
two experienced OCI Architects
to join my client on an initial 6-month contract:
Key Responsibilities:
- Design
end-to-end OCI architectures
covering compute, storage, databases, networking, and security - Implement
cloud-native solutions
using Oracle Kubernetes Engine (OKE), Functions, and API Gateway - Ensure optimal
performance, cost-efficiency, and compliance
across all cloud deployments - Drive
cloud migration strategies
from on-premise to OCI - Lead
Infrastructure-as-Code (IaC)
practices using
Terraform - Build and support
CI/CD pipelines
tightly integrated with OCI services
What We're Looking For:
- Proven experience with
Oracle Cloud Infrastructure - Strong knowledge of
Kubernetes, serverless, and microservices architecture - Expertise in
Terraform
and
automation best practices - Ability to lead
cloud transformation
and modernization initiatives - Passion for driving cloud excellence and delivering scalable solutions
Please apply to be contacted with further information.
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
Grow with us
As the Company at the forefront of the creation of the Mobile world, and with more than 60,000 patents to our name, we've made it our business to make a mark. Being part of Ericsson empowers you to learn, lead and perform at your best, shaping future technology. Ericsson is an inclusive employer where you are recognized for the skills, talent, and perspective you bring to the team.
Within the Solution Area Cognitive Networks Solutions Software R&D (SA CNS SW R&D), we offer the opportunity to collaborate with highly qualified Global teams and to enable success stories for our customers. You will be exposed to groundbreaking technology (5G, ML/AI, Automation and Cloud computing) and support the delivery of multiple Data-Intensive projects from our customer base and internal requirements. You must have the ability to perform hands-on Cloud and Data engineering tasks independently coupled with the appropriate testing. Are you ready to write the future with us?
Come, and be where it begins.
About this opportunity:
We are seeking a seasoned Senior/Tech Lead Site Reliability Engineer (SRE) to oversee the design, deployment, and maintenance of its cloud-native SaaS infrastructure on AWS working with Ericsson R&D Global Teams. This position demands in-depth expertise in AWS technologies—including Fargate and App Runner—Terraform, Python (AWS SDK), Helm, and GitOps. The individual will lead a dedicated SRE team to establish best practices, optimize service reliability, and continuously improve security and performance across a multi-tenant SaaS environment.
What you will do:
- Cloud-Native SaaS Architecture:
- Architect, deploy, and manage multi-tenant SaaS Cognitive Solutions on AWS using AWS Services (e.g., IAM, S3, EKS, ECS, Fargate, App Runner, RedShift, SNS, SQS, EventBridge, Athena, SageMaker, Aurora, DynamoDB, Cognito, API Gateway, etc.) to build Microservices, Data Flows, Data Warehouse, and AI/ML models, emphasizing scalability, reliability, and cost efficiency.
- Champion microservices, container orchestration, and serverless paradigms to ensure high availability and optimal performance.
- SaaS Control Plane API to design, develop, and maintain APIs that manage multi-tenancy in a cloud-based SaaS environment. experience in building scalable and secure APIs that enable efficient tenant management, access control, and resource provisioning.
- Infrastructure as Code (IaC)
- Develop and maintain infrastructure definitions using Terraform to enable reliable, automated, and repeatable deployments.
- Collaborate with cross-functional teams to incorporate IaC principles into CI/CD pipelines, accelerating feature releases and minimizing downtime.
- Site Reliability Engineering & Observability:
- Define and track Service Level Indicators (SLIs) and Objectives (SLOs), establishing error budgets that align with organizational goals.
- Implement robust observability solutions (e.g., AWS CloudWatch, CloudTrail, AWS Config, etc.) to proactively detect and resolve performance bottlenecks.
- Containerization & Helm
- Utilize Kubernetes (EKS) and Helm charts to package, configure, and deploy containerized applications efficiently.
- Streamline container orchestration workflows, focusing on auto-scaling, upgrades, rollbacks, and enhanced service resiliency.
- GitOps & Automation
- Employ GitOps tools (Argo CD, Flux) to govern infrastructure and application deployments through declarative, version-controlled configurations.
- Automate operational tasks using scripting languages (Python, Bash, PowerShell) and AWS SDK (boto3), improving developer productivity and reducing manual overhead.
- DevSecOps & Compliance
- Embed security best practices within the software development lifecycle, covering identity and access management (IAM), networking, VPC, encryption, and monitoring.
- Ensure adherence to cloud compliance standards (SOC 2, HIPAA, GDPR, etc.), performing regular audits and vulnerability scans to maintain a robust security posture.
- AI & Machine Learning Operations (MLOps)
- Provide operational support for AI/ML models running on AWS, collaborating with data science teams to optimize performance and reliability.
- Integrate MLOps methodologies into existing workflows, ensuring seamless model deployment, monitoring, and updates.
- Performance & Cost Optimization
- Conduct capacity planning, load testing, and performance tuning across AWS resources.
- Leverage reserved instances, auto-scaling, and right-sizing strategies to balance reliability, performance, and cost effectiveness.
- Incident Management & Continuous Improvement
- Oversee on-call rotations and lead incident response, rapidly mitigating service disruptions and guiding root cause analysis.
- Foster a culture of continuous improvement, refining operational processes and enhancing platform architecture to boost resilience.
- Leadership & Mentorship
- Manage and mentor a cross-functional SRE team, promoting a collaborative, results-driven environment and advancing professional growth.
- Collaborate with product owners, development teams, and stakeholders to align SRE priorities with broader business objectives.
You Will Bring
- Education Bachelor's degree in Computer Science, Computer Engineering, or a related field.
- Experience
- Overall Software Development: 6+ years of professional experience in software development.
- Site Reliability Engineering: 3+ years of dedicated SRE experience with a primary focus on AWS cloud services and infrastructure.
- Technical Expertise
- Cloud Computing Concepts: Deep understanding of virtualization, networking, and storage in public cloud environments.
- AWS Proficiency: Demonstrated ability to manage, operate, and secure AWS services (., IAM, S3, EKS, ECS, Fargate, App Runner, RedShift, SNS, SQS, EventBridge, Athena, SageMaker, Aurora, DynamoDB, Cognito, API Gateway, etc.).
- AWS for AI/ML: Hands-on support of AI/ML model operations on AWS, collaborating with data science teams and optimizing ML workloads.
- Kubernetes & Container Management: Proven experience with Kubernetes (preferably EKS) for container orchestration, including deploying and maintaining production workloads.
- Helm Package Management: Skilled in creating and managing Helm charts for Kubernetes-based applications.
- IaC Frameworks: Proficiency in Terraform and Burrito (if applicable), ensuring production-grade, scalable infrastructure definitions.
- Scripting & Automation: Advanced skills in Python (including AWS SDK/boto3), Bash, and/or PowerShell for automating cloud operations.
- DevSecOps & GitOps: Hands-on experience integrating security best practices into CI/CD pipelines, leveraging GitOps tools (Argo CD, Flux) for declarative deployments.
- MLOps: Working knowledge of machine learning lifecycle management, ensuring robust and efficient AI/ML model deployments.
- Linux Administration: Strong background in Linux system management, performance tuning, and troubleshooting.
- Networking: Expertise in VPNs, firewalls, routing, switching, DNS, load balancers, and related security considerations.
- Monitoring & Observability: Proficiency with one or more monitoring solutions (Datadog, Prometheus, Grafana, CloudWatch) to drive proactive incident response.
- Security & Compliance: In-depth familiarity with SOC 2, HIPAA, GDPR, and best practices around IAM, encryption, and network segmentation.
- Problem-Solving & Communication: Demonstrated strength in diagnosing complex technical issues and effectively communicating solutions to varied stakeholders.
- Certifications
- AWS Certifications: AWS Certified Solutions Architect (Associate/Professional), AWS Certified DevOps Engineer – Professional, or other relevant certifications.
- Additional certifications in GCP, Azure, security (CISSP, CISM) are considered advantageous.
- Additional Desirable Qualifications
- Other Cloud Environments: Exposure to Azure or further GCP services beyond AI/ML is beneficial.
- Advanced Programming/Scripting: Experience in Python, Go or other modern languages is a plus.
- Team Leadership: Demonstrated success in building and leading cross-functional teams, including performance management and strategic planning.
- Non-technical skills:
- Be inspired by the needs of fast-changing environments.
- Happy to work within distributed teams.
- Coordinate with software, DevSecOps, and domain experts.
- Proactive & team player.
- Excellent oral and written communication skills.
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
Company Description
At Sana Commerce, we're committed to creating an inclusive environment because we know our diverse workforce is one of our greatest strengths.
What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company that helps manufacturers, distributors, and wholesalers thrive in B2B commerce complexity.
Our mission? To transform the way businesses buy and sell, so they can grow, build stronger relationships, and make the most of digital commerce. Join us and take ownership of your career in a dynamic, fast-moving environment.
At Sana Commerce, we're looking for a Senior Site Reliability Engineer to strengthen our reliability, observability, and automation capabilities across our Azure and Kubernetes-based platforms. This role blends hands-on operational excellence with engineering practices, ensuring uptime today while building the systems that make tomorrow more resilient.
This SRE position focuses on engineering reliability in everything we do: automating repetitive tasks, improving monitoring signals, running deep root cause analysis, and shaping systems for scalability. You'll be the engineer others look to during critical incidents, and the one raising the bar on how we prevent them in the first place.
What you'll get:
- The opportunity to make an impact at a fast-growing SaaS scale-up;
- A global and customized onboarding program (9,1/10 rated by previous hires);
- A hybrid working model – 3 days from the office, 2 days from home.
Job Description
What you'll be doing
- Lead incident response and root cause analysis by driving deep investigations, educating the team, and delivering actionable post-incident insights that prevent recurrence.
- Manage Kubernetes and Azure environments by owning cluster configurations, platform usage, and ensuring availability, cost efficiency, and security best practices.
- Develop observability and monitoring strategies with Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, and Azure Monitor to measure performance, user impact, and continuously refine alerts and dashboards.
- Implement and maintain edge and CDN integrations (Fastly WAF, bot management, CDN) to enhance performance, security, and reliability of customer-facing services.
- Write and debug automation scripts in PowerShell, Bash, Python, or C#, ensuring logging, rollback, and versioning practices make the platform more resilient and self-healing.
- Drive Infrastructure-as-Code adoption with Terraform, Bicep, and ARM to standardize environments, automate deployments, and reduce manual interventions.
- Optimize system and application performance through deep monitoring, dump analysis, and right-sizing of resources to eliminate bottlenecks and maximize efficiency.
- Collaborate across teams to break down complex problems, contribute to CI/CD and SDLC improvements, and embed reliability into development and release pipelines.
- Participate in the on-call rotation by taking ownership of incidents, coordinating responses, and ensuring sustainable fixes rather than temporary workarounds.
Qualifications
What you bring
- 8+ years of experience in SRE, DevOps, or Cloud Infrastructure, with demonstrated ownership of large-scale systems.
- Strong hands-on knowledge of Microsoft Azure services and practical experience operating Azure Kubernetes clusters in production.
- Expertise in Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, Azure Monitor (KQL). Able to design actionable monitoring that leads to prevention, not just detection.
- Proficient in at least one programming/scripting language (PowerShell, Bash, Python, or C#). Strong debugging and logging practices.
- Hands-on experience with Infrastructure-as-Code (Terraform, Bicep, or ARM) to automate and manage cloud infrastructure.
- Solid understanding of TCP/IP protocols and troubleshooting network issues in distributed systems.
- Ability to go beyond surface fixes, identify patterns, and engineer permanent improvements.
- Strong communicator who can work with cross-functional teams and explain complex issues simply.
- Microsoft Certified: Azure Administrator Associate
- CKA: Certified Kubernetes Administrator
Who we are:
So, what does it mean to be a part of the Sana Commerce team?
At Sana Commerce, our values guide how we work, collaborate, and drive success.
- Champions of Our League. "We deliver lasting success, balancing quick wins and long-term value." We take pride in our unique product and extensive B2B knowledge and continuously strive to improve. No matter our role, we bring value every day, helping our customers and partners succeed.
- Supercharge Our Customers. "We're revolutionizing B2B commerce together, helping our customers to lead and succeed." Our customers are at the heart of everything we do. We go beyond solutions, providing the tools and support they need to grow.
- Determined to Grow. "We embrace challenges, growing and raising the bar for ourselves and our industry." We take on challenges, seek feedback, and keep learning. Every setback is a chance to improve and move forward.
- Bold Together. "We dare to be bold because we have each other's back." We collaborate across teams and time zones, challenge the status quo, and support each other to achieve the best outcomes.
Job descriptions can be tough to interpret. Even if you may not tick all the boxes,
please explain your motivation for the role of Data Engineer (AI/ML) in a cover letter
, we strongly encourage you to apply if you still feel like you are a great match for this role.
Apply now
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
We are seeking a seasoned
Senior Site Reliability Engineer
to enhance the resilience, performance, and scalability of our cloud-based platforms. The ideal candidate will combine strong operational skills with engineering discipline, automating tasks, improving monitoring, and owning incident response.
Key Responsibilities
- Lead and manage incident response processes: root-cause analysis, post-incident reviews, and ensuring preventative actions
- Operate and maintain Kubernetes clusters and Azure cloud environments, focusing on reliability, security, and cost‐efficiency
- Design, implement, and evolve observability/monitoring systems and dashboards; refine alert thresholds and metrics for better insight into performance and failure modes
- Develop automation scripts (e.g. in Bash, PowerShell, Python, or C#) for operational tasks, with logging, rollback, version control
- Use Infrastructure-as-Code (IaC) tools (e.g. Terraform, ARM templates, Bicep) to standardize and automate infrastructure provisioning and changes
- Optimize system and application performance: resource sizing, bottleneck identification, performance tuning
- Integrate CDN, WAF or edge caching, bot protection to improve system security and performance
- Collaborate across teams (development, product, operations) to embed reliability in architecture, CI/CD pipelines, and deployment processes
- Participate in on-call rotation and ensure sustainable, long term fixes rather than temporary patches
Required Qualifications & Skills
- 8+ years of experience in infrastructure, site reliability engineering (SRE), DevOps or cloud operations roles with demonstrable ownership of large-scale systems
- Strong hands-on experience with Microsoft Azure services and operating Kubernetes in production
- Proficiency with observability tools (e.g. Dynatrace, Honeycomb, ElasticSearch/Kibana/Grafana, or similar); ability to design meaningful SLIs/SLOs/alerts
- Programming/scripting proficiency in one or more languages (PowerShell, Bash, Python, C#, etc.) with good practices (logging, versioning, error handling)
- Expertise in Infrastructure-as-Code to automate provisioning/deployment and reduce manual operations
- Deep understanding of distributed systems, networking, performance optimization
- Strong communication skills; ability to explain technical issues and collaborate with cross-functional teams
Nice to Have
- Certifications relevant to Azure or Kubernetes (e.g. Azure Administrator, Certified Kubernetes Administrator)
- Experience with CDN / WAF integrations and edge architectures
- Experience in bot management, security hardening, or securing distributed systems
- Experience working in SaaS environments or with high-availability SLAs
Senior, Site Reliability Engineer
Posted today
Job Viewed
Job Description
Welcome to the world of Mrsool Where on-demand delivery meets unparalleled user needs to deliver anything you desire. As one of the largest delivery platforms in the Middle East and North Africa (MENA) region, Mrsool has captivated users with its unique and seamless experience, earning it the highest ratings among all major delivery platforms on both Apple's App Store and Google's Play Store.
What sets Mrsool apart is its commitment to providing an unmatched "order anything from anywhere" experience. This extraordinary feat is made possible by our extensive fleet of dedicated on-demand couriers. With their unwavering dedication, they ensure that your desired items reach your doorstep, no matter where you are.
Whether it's a late-night craving, a forgotten item, or a special gift for a loved one, Mrsool is here to deliver, quite literally. We take pride in the convenience we offer, empowering you to get what you need when you need it, all at the tap of a button.
The Job in a NutshellWe are looking for a highly skilled Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of our systems. The ideal candidate brings deep expertise in AWS, Kubernetes, and modern cloud infrastructure, along with strong problem-solving skills and a proactive approach to improving system resilience and automation.
If you're eager to take on this rewarding opportunity, we'd love to hear from you. Apply today
What You Will Do- Develop and maintain monitoring and alerting systems to proactively identify and address issues.
- Troubleshoot and escalate production incidents to minimize downtime and improve system reliability.
- Continuously improve our infrastructure and processes to optimize scalability and efficiency.
- Participate and take ownership for on-call rotations as needed to ensure 24/7 support for our application.
- Perform routine maintenance and upgrades as needed to keep our systems up to date.
- Contribute to ongoing efforts to improve our security posture and compliance with industry standards.
- Communicate complex technical concepts clearly and concisely to both technical and non-technical stakeholders in order to make the right decision.
- Mentor and coach junior engineers, fostering their professional growth and enabling them to deliver high-quality work.
- Stay up-to-date with the latest advancements and trends in site reliability engineering and share knowledge and insights with the team.
- Identify opportunities for organizational enhancements and propose alternatives to optimize team structures and execution.
- Collaborate with development teams to design and implement automated deployment and testing pipelines.
- Collaborate with development teams to design and implement scalable Infrastructure.
What Are We Looking For
- Bachelor's degree in Computer Engineering, Computer Science, or related field.
- 5+ years of experience in a similar role, preferably with experience in a high-traffic, high-availability environment.
- Proficiency in at least one programming language (Python, Ruby, Java, Go, etc.).
- Strong understanding of cloud infrastructure and related technologies (AWS, GCP, Azure, Kubernetes, Docker, etc.)
- Excellent troubleshooting and problem-solving skills.
- Experience with one or more automation and configuration management tools (Chef, Ansible, Puppet, Terraform, etc.).
- Familiarity with monitoring and alerting tools (Prometheus, Grafana, Nagios, etc.)
- Strong communication and interpersonal skills, enabling effective collaboration with cross-functional teams.
- Ability to navigate ambiguity, set clear expectations, and thrive in a fast-paced, dynamic environment.
- A strong grasp of computer science fundamentals when it comes to dealing with distributed systems and networks.
What We Offer You
- Inclusive and Diverse Environment: We foster an inclusive and diverse workplace that values innovation and provides flexibility.
- Competitive Compensation: Our compensation packages are competitive and include potential share options. Additionally, you will benefit from a performance-based commission/ incentive structure, rewarding your achievements.
- Personal Growth and Development: We are committed to your professional development, offering regular training and an annual learning stipend to help you advance your career in a fast-paced, dynamic environment.
- Autonomy and Mentorship: You'll enjoy a degree of autonomy in your role, supported by mentorship and ambitious goals that drive both your personal success and the company's growth.
Be The First To Know
About the latest Sre professional Jobs in Egypt !
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
About Us
Ready to change the world? We're reinventing freight and logistics at Trella. Backed by a number of leading VC companies (YC, Maersk Growth, Algebra Ventures and Raed Ventures) and we're looking for the best talent out there to help us build and scale our product offering. We aspire to create a step-change in the industry and we want you to be a part of the journey
We are innovative problem-solvers on this adventure together. Working at Trella means that you'll be surrounded by colleagues who are constantly pushing boundaries, thinking ahead, and meeting the high standards we set for ourselves. When we build, we do so in a product-led way: we value our customer experience and scalability, and we prioritize how we build our product accordingly.
Our Purpose
At Trella our Vision is to
Empower our Communities to move Economies Forward
, and we're doing this by building a digital experience that provides our Shippers, Carriers and Teams with the right technology and platform that reduces the costs of moving goods —> Simply, we're trying to
disrupt
and
reinvent
trucking, and
empower
our economies. We have launched from Egypt to Saudi Arabia, Pakistan and UAE, and are looking to build and expand our footprint across the MENA-P region.
What You'll Do:
- Lead / Collaborate with Engineering teams to build SRE culture, maintain development, staging and production systems.
- Managing infrastructure reliability, scalability and security using SRE principles including setting up SLOs, tracking error budgets, Production Readiness Review (PRR) etc.
- Automation of Infrastructure provisioning using Terraform / CloudFormation.
- Automation of IT Infrastructure related tasks using chef, puppet, ansible etc.
- Proactively monitor and optimize Infrastructure costs.
- Maintain and improve API-gateways, web servers, cache & database configurations, CICD pipelines etc.
- Maintain and monitor deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure
- Participate in on-call rotations and active involvement in resolving incidents, writing production incident reports.
- Hire, mentor and coach junior team members.
What You'll Need:
- BS/MS in Computer Science, IT or related technical field with 6+ years of relevant experience.
- Expert programming and scripting skills preferable bash, shell scripting.
- Expert in computer science in foundations such as Operating Systems, Computer Networks, OWASP principles.
- Excellent system debugging skills, hands on experience with optimizing database configurations and queries for postgres.
- Expert in at least one web server such as Haproxy, nginx, apache etc.
- Experience working with established cloud platforms like AWS, Azure or Google Cloud.
- Experience building and setup of reliable CI/CD pipelines, observability platform, monitoring and alerting tools.
- Experience using linux systems and command line system administration.
- Experience with automation tools like Chef, Puppet, Ansible etc.
- Experience with Kubernetes platform such as EKS.
What We Offer
- Hybrid work model with flexible working hours.
- The experience of working in one of Forbes Middle East's top 50 most funded start-ups in MENA
- Annual performance review
- Flexible leave policy that supports your work-life balance and personal needs.
- Development opportunities in a rapidly growing multinational company.
- Early payday option, allowing you to access your earnings sooner helping you manage expenses and financial planning with greater ease.
- Supporting our colleagues to build and grow themselves through Learning & Development initiatives.
Senior Cloud Infrastructure Engineer
Posted today
Job Viewed
Job Description
Sarmad is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our dynamic team. This position plays a critical role in designing, deploying, and managing our cloud infrastructure to ensure optimal performance, security, and reliability. As a Senior Cloud Infrastructure Engineer, you will work closely with cross-functional teams to deliver scalable cloud solutions that meet our business objectives.
Responsibilities:- Design, implement, and manage cloud infrastructure solutions using services from AWS, Azure, and especially Oracle Cloud.
- Ensure the security and integrity of the cloud infrastructure through best practices.
- Perform capacity planning and scaling of cloud resources based on business needs.
- Monitor and troubleshoot performance, availability, and security of cloud resources.
- Automate tasks using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Provide technical support and guidance to junior engineers and other teams.
- Stay updated on emerging cloud technologies and industry trends to continuously refine cloud architecture.
Requirements
- 5+ years of experience in cloud engineering or related field.
- Strong expertise in cloud platforms such as AWS, Azure, or Oracle Cloud.
- Experience with Linux Based Operating systems (CentOS, Redhat, and Ubuntu).
- Experience with Infrastructure as Code (IaC) tools (Terraform, CloudFormation).
- Proficient in scripting languages for automation tasks.
- In-depth knowledge of cloud security best practices and compliance standards.
- Experience with containerization technologies (Docker, Kubernetes) is a plus.
- Familiarity with CI/CD tooling and DevOps methodologies.
- Strong problem-solving skills and ability to work in a fast-paced environment.
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Relevant cloud certifications (e.g., AWS Certified Solutions Architect, OCI Certified Architect) are preferred.
Benefits
- Hybrid work model
- Healthy working environment
- Medical Insurance
- Social Insurance
Senior Cloud Infrastructure Engineer
Posted 21 days ago
Job Viewed
Job Description
Job Description: - Collaborate with stakeholders to understand business needs and translate them into technical requirements for cloud infrastructure. - Analyze existing systems and applications to identify components suitable for cloud migration. Develop migration strategies, including sequencing, dependencies, and potential impact on existing systems. - Maintain and agree on the VNF Lifecycle Process and procedure with the application vendors. - Identify opportunities to optimize cloud resource allocation, performance, and cost. Recommend improvements to ensure optimal use of cloud services. - Monitor cloud infrastructure performance, identify bottlenecks, and proactively troubleshoot issues to maintain high availability and performance. - Create and maintain technical documentation, including architecture diagrams, standard operating procedures, and reports related to cloud infrastructure planning and optimization. - Report cloud network capabilities and resource utilization status periodically. - Work closely with cross-functional teams, including developers, system administrators, and network engineers, to gather requirements, provide guidance, and share best practices related to cloud planning and implementation. - Research and evaluate emerging cloud technologies, tools, and industry trends. Provide recommendations for adopting new technologies to improve the cloud infrastructure.
Requirements
Personal Skills: - Bachelor's degree in computer science, Information Technology, or a related field. Relevant certifications. - At least 5 years of experience, working in Information Technology and Cloud Computing Technical Skills: - Hands-on experience with Linux (RHEL/Centos/Ubuntu) - Hands-on experience on OpenStack (Preferred Rehat OpenStack). - NFVI architecture and MANO stack. - VNF installation and lifecycle management - Strong Team Management Skills. - Excellent communication, interpersonal and negotiation skills - Excellent problem-solving skills - Excellent presentation skills.
About the company
Giza Systems, a leading systems integrator in the MEA region, designs and deploys industry-specific technology solutions for asset-intensive industries such as the telecoms, utilities, oil and gas, hospitality and real estate among other market sectors. We help our clients streamline their operations and businesses through our portfolio of solutions, managed services, and consultancy practice. Our team of 1000 professionals are spread throughout the region with anchor offices in Cairo, Riyadh, Dubai, Doha, Nairobi, Dar-es-Salaam, Abuja, Kampala and New Jersey, allowing us to service an ever-increasing client base in over 40 countries.