Resume
Sina Moghaddas — Senior Platform Engineer at Mollie. 12+ years building and operating infrastructure at scale: multi-region GCP, payment-critical systems, zero-downtime at every step.
Summary
12+ years building and operating infrastructure at scale and payment-critical systems — from bare-metal OpenStack and Ceph clusters to multi-region GCP active-active. At Mollie I led two company-wide GCP region switchovers end-to-end, each migrating the majority of production payment traffic with zero downtime; built and owned the public API gateway from scratch on Apache APISIX; and cut system RTO from 10 to 3 minutes by automating the full switchover process.
Core Skills
- Cloud: GCP, OpenStack, Ceph, AWS
- Containers: Kubernetes, Docker, LXC, Helm, FluxCD
- IaC: Terraform, Ansible
- Observability: Datadog, Prometheus, Grafana, Elasticsearch, VictoriaMetrics
- Networking: GCP External Load Balancers, Apache APISIX, HAProxy, Consul, iptables
- Programming: Python, Bash, Lua, Golang
- Practices: SLI/SLO design, incident management, capacity planning, DR
Experience
- Led the 2024 and 2025 GCP region switchovers as project owner, coordinating across multiple engineering teams; both delivered with zero downtime
- Drove switchover architecture from single-primary to active-active across two GCP regions, improving fault isolation and resilience at the payment layer
- Designed and owned the public API gateway on Apache APISIX across both GCP regions; developed Lua plugins for upstream error isolation and circuit breaking; reduced auth latency by ~100ms per request
- Cut system RTO from 10 to 3 minutes by automating the full switchover process: advanced GCP External Load Balancers, programmatic traffic weight management, emergency CI/CD pipeline jobs, and runbook redesign
- Owned capacity planning for platform, payment processing, and edge infrastructure ahead of peak traffic events; stack sustained load across both active GCP regions with zero infrastructure incidents
- Wrote company-wide rollback guidelines and DR runbooks adopted as the primary recovery reference across engineering teams
- Migrated critical services (main application, Redis, RabbitMQ, Elasticsearch) from bare metal to GCP/GKE via phased rollouts; each cutover completed without service disruption
- Deployed RabbitMQ clusters in two GCP regions for platform and payment processing, enabling full bare-metal decommissioning and cross-region messaging failover
- Architected and executed the migration of a large-scale Elasticsearch cluster with zero downtime
- Defined SLIs and SLOs across platform and payment services with Datadog alerting; reliability baselines became the traffic-shift gates in the 2024 region switchover
- Automated VM and bare-metal lifecycle management for hundreds of servers with Ansible
- Replaced Jenkins pipelines with a Drone/FluxCD-based CI/CD system, shrinking pipeline runtime by 3x
- Introduced Kubernetes as the team's first container orchestration platform; established GitOps delivery with FluxCD
- Provisioned isolated dev environments on AWS with Terraform for all VoIP teams
- Technical lead across storage, network, compute, and datacenter teams; owned quarterly-to-annual roadmaps for OpenStack, Kubernetes, and Ceph
- Deployed multiple OpenStack regions supporting tens of thousands of active VMs across thousands of projects on hundreds of compute nodes
- Set up multiple Ceph clusters at petabyte scale
- Shipped a live video delivery platform for Iran's national broadcaster handling tens of Gb/s at launch; combined Nginx and FFmpeg with HLS and MPEG-DASH adaptive bitrate output
- Configured an Nginx-based WAF handling high-volume traffic with custom modules, iptables/ebtables packet filtering, and OS-level network tuning
- Rolled out centralized monitoring and logging across multiple data centers using VictoriaMetrics and Elastic Stack
Open Source
- Patched a silent bug where
capacityScalerset to 0 was dropped from the GCP API payload, leaving backends active despite zero-capacity configuration; fix merged into the main Google Cloud project - Added PodDisruptionBudget support to keep the controller available during GKE node maintenance
Certifications
- Google Cloud Professional Security Engineer
- Google Cloud Professional Network Engineer
- Google Cloud Professional Architect
- Google Cloud Professional Data Engineer
- CKS: Certified Kubernetes Security Specialist
- CKA: Certified Kubernetes Administrator
- CKAD: Certified Kubernetes Application Developer
Contact
Currently Senior Platform Engineer at Mollie. Open to select consulting and advisory engagements in reliability engineering, platform architecture, and multi-region operations.
LinkedIn: linkedin.com/in/sinamoghaddas