Cloud Native/Serverless Reliability Engineer (SRE) Job at Alibaba Cloud, Sunnyvale, CA

S0xPMUJCc0ZPbGNtcllnN3ptYmlMSEZ2aGc9PQ==
  • Alibaba Cloud
  • Sunnyvale, CA

Job Description

The Alibaba Cloud Cloud Native Serverless Team is a leading innovation force within Alibaba Cloud, dedicated to empowering developers and enterprises with cutting-edge serverless technologies. Focused on building scalable, cost-efficient, and fully managed serverless solutions, the team drives the evolution of cloud-native architectures by abstracting infrastructure complexity and enabling seamless integration with modern application development paradigms. Delivering industry-leading serverless solutions that directly compete with AWS Lambda and other global cloud providers.

Cloud Product Operations & Reliability

● Oversee stability maintenance, performance tuning, and high-availability architecture design for serverless system components. Ensure 24/7 reliability of mission-critical systems.

● Manage containerized lifecycle on serverless clusters: Implement deployments, auto-scaling, version upgrades, and resource optimization in serverless environments.

Incident Response & Root Cause Analysis

● Lead troubleshooting of serverless, middleware, cloud products related incidents (e.g., key-value storage, message backlog, service registration failures) through log analysis, distributed tracing, and monitoring systems.

● Develop diagnostic tools using Go/Rust to resolve production issues, performance bottlenecks, and compatibility challenges.

Automation & Operational Excellence

● Build automation tools to standardize serverless system deployment, monitoring, and disaster recovery.

● Implement chaos engineering experiments, capacity planning strategies, and failover mechanisms to enhance system resilience.

Collaboration & Best Practices

● Partner with teams to optimize cloud product adoption strategies and deliver architecture design consultation.

● Create comprehensive technical documentation and drive standardization of serverless operations.

Minimum qualification:

● Bachelor's+ in Computer Science with 3+ years in SRE/serverless operations.

● Deep understanding of SRE principles: Balancing reliability metrics (SLIs/SLOs) with engineering velocity.

● Proven ability to diagnose complex distributed system failures under pressure.

● Excellent communication skills to drive cross-team collaboration and technical documentation.

Preferred qualification:

● Experience modifying cloud-based product source code for performance optimization, serverless experience is preferred.

● Expertise in large-scale distributed systems (10k+ topics, 1k+ node clusters).

● Kubernetes certifications (CKA/CKAD) or cloud provider certifications.

The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.

If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.

Job Tags

Similar Jobs

Valsoft Services

Director of Research & Development Job at Valsoft Services

 ...Role Overview We're looking for a Director of Research & Development to lead the modernization and expansion of our software products for the alternative financial services industry. This is a hands-on leadership role that merges software engineering expertise with... 

Equinox

Personal Trainer, Chestnut Hill Job at Equinox

 ...Job Description Job Description As an Equinox personal trainer your career becomes an empowered lifestyle founded on maximizing both your personal and client performance. Under the guidance of two dedicated managers you will develop and refine an approach to programming... 

Costco Wholesale

Gas Station Attendant Job at Costco Wholesale

OverviewGas Station Attendant at Costco Wholesale. Monitors gas pumps, traffic flow, storage...  ...comprehensive package of benefits including paid time off, health benefits (medical/dental/...  ...levelNot ApplicableEmployment typeFull-timeJob functionManagement and Manufacturing... 

North American Partners In Anesthesia

CRNA - HSHS St. John's Hospital - Full Time Job at North American Partners In Anesthesia

 ...Springfield,IL - USA Position Requirements CRNA opportunity in community-oriented Springfield, IL, at HSHS St. John's Hospital, a regional medical center Description HSHS St. John's Hospital, based in the Illinois state capital of Springfield, is the flagship... 

St. Anne's Family Services

Child & Family Services Advocate (46896) Job at St. Anne's Family Services

Child & Family Services Advocate (46896)About Us: St. Anne's Family Services is dedicated to transforming lives and communities. We provide...  ...Head Start and other programs that promote family and student engagement, required.Bilingual English and Spanish, Tagalog,...