Resume

Sina Moghaddas — Senior Platform Engineer at Mollie. 12+ years building and operating infrastructure at scale: multi-region GCP, payment-critical systems, zero-downtime at every step.

Summary

12+ years building and operating infrastructure at scale and payment-critical systems — from bare-metal OpenStack and Ceph clusters to multi-region GCP active-active. At Mollie I led two company-wide GCP region switchovers end-to-end, each migrating the majority of production payment traffic with zero downtime; built and owned the public API gateway from scratch on Apache APISIX; and cut system RTO from 10 to 3 minutes by automating the full switchover process.

Core Skills

  • Cloud: GCP, OpenStack, Ceph, AWS
  • Containers: Kubernetes, Docker, LXC, Helm, FluxCD
  • IaC: Terraform, Ansible
  • Observability: Datadog, Prometheus, Grafana, Elasticsearch, VictoriaMetrics
  • Networking: GCP External Load Balancers, Apache APISIX, HAProxy, Consul, iptables
  • Programming: Python, Bash, Lua, Golang
  • Practices: SLI/SLO design, incident management, capacity planning, DR

Experience

Mollie B.V. Senior Platform Engineer — Apr 2025 – Present
  • Led the 2024 and 2025 GCP region switchovers as project owner, coordinating across multiple engineering teams; both delivered with zero downtime
  • Drove switchover architecture from single-primary to active-active across two GCP regions, improving fault isolation and resilience at the payment layer
  • Designed and owned the public API gateway on Apache APISIX across both GCP regions; developed Lua plugins for upstream error isolation and circuit breaking; reduced auth latency by ~100ms per request
  • Cut system RTO from 10 to 3 minutes by automating the full switchover process: advanced GCP External Load Balancers, programmatic traffic weight management, emergency CI/CD pipeline jobs, and runbook redesign
  • Owned capacity planning for platform, payment processing, and edge infrastructure ahead of peak traffic events; stack sustained load across both active GCP regions with zero infrastructure incidents
  • Wrote company-wide rollback guidelines and DR runbooks adopted as the primary recovery reference across engineering teams
Mollie B.V. Site Reliability Engineer → Senior SRE — Jun 2022 – Apr 2025
  • Migrated critical services (main application, Redis, RabbitMQ, Elasticsearch) from bare metal to GCP/GKE via phased rollouts; each cutover completed without service disruption
  • Deployed RabbitMQ clusters in two GCP regions for platform and payment processing, enabling full bare-metal decommissioning and cross-region messaging failover
  • Architected and executed the migration of a large-scale Elasticsearch cluster with zero downtime
  • Defined SLIs and SLOs across platform and payment services with Datadog alerting; reliability baselines became the traffic-shift gates in the 2024 region switchover
  • Automated VM and bare-metal lifecycle management for hundreds of servers with Ansible
Enreach Site Reliability Engineer — Jun 2021 – May 2022
  • Replaced Jenkins pipelines with a Drone/FluxCD-based CI/CD system, shrinking pipeline runtime by 3x
  • Introduced Kubernetes as the team's first container orchestration platform; established GitOps delivery with FluxCD
  • Provisioned isolated dev environments on AWS with Terraform for all VoIP teams
SRE Together Freelance SRE / Tech Lead — Jan 2016 – Jun 2021
  • Technical lead across storage, network, compute, and datacenter teams; owned quarterly-to-annual roadmaps for OpenStack, Kubernetes, and Ceph
  • Deployed multiple OpenStack regions supporting tens of thousands of active VMs across thousands of projects on hundreds of compute nodes
  • Set up multiple Ceph clusters at petabyte scale
  • Shipped a live video delivery platform for Iran's national broadcaster handling tens of Gb/s at launch; combined Nginx and FFmpeg with HLS and MPEG-DASH adaptive bitrate output
  • Configured an Nginx-based WAF handling high-volume traffic with custom modules, iptables/ebtables packet filtering, and OS-level network tuning
  • Rolled out centralized monitoring and logging across multiple data centers using VictoriaMetrics and Elastic Stack

Open Source

gke-autoneg-controller GoogleCloudPlatform — Golang — 2025
  • Patched a silent bug where capacityScaler set to 0 was dropped from the GCP API payload, leaving backends active despite zero-capacity configuration; fix merged into the main Google Cloud project
  • Added PodDisruptionBudget support to keep the controller available during GKE node maintenance

Certifications

  • Google Cloud Professional Security Engineer
  • Google Cloud Professional Network Engineer
  • Google Cloud Professional Architect
  • Google Cloud Professional Data Engineer
  • CKS: Certified Kubernetes Security Specialist
  • CKA: Certified Kubernetes Administrator
  • CKAD: Certified Kubernetes Application Developer

Contact

Currently Senior Platform Engineer at Mollie. Open to select consulting and advisory engagements in reliability engineering, platform architecture, and multi-region operations.

LinkedIn: linkedin.com/in/sinamoghaddas