Resume

Sina Moghaddas — Senior Platform Engineer at Mollie. 12+ years building and operating infrastructure at scale: multi-region GCP, payment-critical systems, zero-downtime at every step.

Summary

12+ years building and operating infrastructure at scale and payment-critical systems — from bare-metal OpenStack and Ceph clusters to multi-region GCP active-active. At Mollie I led two company-wide GCP region switchovers end-to-end, each migrating the majority of production payment traffic with zero downtime; built and owned the public API gateway from scratch on Apache APISIX; and cut system RTO from 10 to 3 minutes by automating the full switchover process.

Core Skills

Cloud: GCP, OpenStack, Ceph, AWS
Containers: Kubernetes, Docker, LXC, Helm, FluxCD
IaC: Terraform, Ansible
Observability: Datadog, Prometheus, Grafana, Elasticsearch, VictoriaMetrics
Networking: GCP External Load Balancers, Apache APISIX, HAProxy, Consul, iptables
Programming: Python, Bash, Lua, Golang
Practices: SLI/SLO design, incident management, capacity planning, DR

Experience

Mollie B.V. Senior Platform Engineer — Apr 2025 – Present

Led the 2024 and 2025 GCP region switchovers as project owner, coordinating across multiple engineering teams; both delivered with zero downtime
Drove switchover architecture from single-primary to active-active across two GCP regions, improving fault isolation and resilience at the payment layer
Designed and owned the public API gateway on Apache APISIX across both GCP regions; developed Lua plugins for upstream error isolation and circuit breaking; reduced auth latency by ~100ms per request
Cut system RTO from 10 to 3 minutes by automating the full switchover process: advanced GCP External Load Balancers, programmatic traffic weight management, emergency CI/CD pipeline jobs, and runbook redesign
Owned capacity planning for platform, payment processing, and edge infrastructure ahead of peak traffic events; stack sustained load across both active GCP regions with zero infrastructure incidents
Wrote company-wide rollback guidelines and DR runbooks adopted as the primary recovery reference across engineering teams

Mollie B.V. Site Reliability Engineer → Senior SRE — Jun 2022 – Apr 2025

Migrated critical services (main application, Redis, RabbitMQ, Elasticsearch) from bare metal to GCP/GKE via phased rollouts; each cutover completed without service disruption
Deployed RabbitMQ clusters in two GCP regions for platform and payment processing, enabling full bare-metal decommissioning and cross-region messaging failover
Architected and executed the migration of a large-scale Elasticsearch cluster with zero downtime
Defined SLIs and SLOs across platform and payment services with Datadog alerting; reliability baselines became the traffic-shift gates in the 2024 region switchover
Automated VM and bare-metal lifecycle management for hundreds of servers with Ansible

Enreach Site Reliability Engineer — Jun 2021 – May 2022

Replaced Jenkins pipelines with a Drone/FluxCD-based CI/CD system, shrinking pipeline runtime by 3x
Introduced Kubernetes as the team's first container orchestration platform; established GitOps delivery with FluxCD
Provisioned isolated dev environments on AWS with Terraform for all VoIP teams

SRE Together Freelance SRE / Tech Lead — Jan 2016 – Jun 2021

Technical lead across storage, network, compute, and datacenter teams; owned quarterly-to-annual roadmaps for OpenStack, Kubernetes, and Ceph
Deployed multiple OpenStack regions supporting tens of thousands of active VMs across thousands of projects on hundreds of compute nodes
Set up multiple Ceph clusters at petabyte scale
Shipped a live video delivery platform for Iran's national broadcaster handling tens of Gb/s at launch; combined Nginx and FFmpeg with HLS and MPEG-DASH adaptive bitrate output
Configured an Nginx-based WAF handling high-volume traffic with custom modules, iptables/ebtables packet filtering, and OS-level network tuning
Rolled out centralized monitoring and logging across multiple data centers using VictoriaMetrics and Elastic Stack

Open Source

gke-autoneg-controller GoogleCloudPlatform — Golang — 2025

Patched a silent bug where capacityScaler set to 0 was dropped from the GCP API payload, leaving backends active despite zero-capacity configuration; fix merged into the main Google Cloud project
Added PodDisruptionBudget support to keep the controller available during GKE node maintenance

Certifications

Google Cloud Professional Security Engineer
Google Cloud Professional Network Engineer
Google Cloud Professional Architect
Google Cloud Professional Data Engineer
CKS: Certified Kubernetes Security Specialist
CKA: Certified Kubernetes Administrator
CKAD: Certified Kubernetes Application Developer

Contact

Currently Senior Platform Engineer at Mollie. Open to select consulting and advisory engagements in reliability engineering, platform architecture, and multi-region operations.

LinkedIn: linkedin.com/in/sinamoghaddas