BACK TO PORTFOLIO REGISTRY
Private Architecture
December 20243 weeks

On-Premise to AWS Migration

Lead Migration EngineerEngineering Dossier

Achievement Log

2024-10 Week 1–2: Audited on-premise stack (47 tables, 14 latin1 collation issues, 3 TB assets). Designed 3-layer VPC. Hit collation errors on first RDS import — built collation_rewriter.py, clean import on second attempt. 2024-10 Week 4: VPN benchmarked at 12 MB/s (projected 70+ hours). Switched to Direct Connect (85 MB/s). Bulk Rsync completed in 12 hours. 2024-11 Week 1–2: Discovered Rsync permission corruption and PHP 8.2 socket path bug — both fixed. AMI launch to health check: 52 seconds. RDS max_connections exhaustion found in load test — increased to 200. 2024-11 Week 3 – 2024-12 Week 1: 48-hour parallel validation — resolved 3 env config differences. Fixed queue workers not starting on new ASG instances. 2024-12 Week 2: Cutover at 02:00 UTC Saturday. DNS updated in 3 minutes, 0 user-reported issues. On-premise moved to read-only. 2024-12 Week 3–4: Rightsized ASG (3→2 instances). On-premise decommissioned day 7. Full runbook delivered. Project closed.

Overview

Migrated a production Laravel application, MySQL database (11.2M rows, 28 GB), and 3 TB of frontend assets from on-premise Ubuntu servers to AWS with zero reported downtime. Used a Replatform approach for MySQL (on-premise → RDS Multi-AZ) and Rehost with Modernization for the app layer (bare-metal → EC2 Auto Scaling Group with AMI baking). DNS-level cutover executed in under 3 minutes during a low-traffic window after 48 hours of parallel validation.

Core Technologies

TerraformAWS EC2 & Auto ScalingAmazon RDS (MySQL)AWS Application Load Balancer (ALB)TCO Modeling & Cloud EconomicsCloud Migration Strategy (Replatform)Bash Scripting & Automation

Implementation & Architecture

AWS Target Environment (Terraform)

3-layer VPC with public ALB subnets, private EC2 subnets, and isolated RDS subnets across 3 AZs. Direct Connect VIF attachment for bulk transfer. EC2 launch template with IMDSv2, encrypted EBS, and UserData fetching secrets from SSM on boot. ASG lifecycle hook triggered cache clearing on new instance launch. CloudFront + S3 OAI for static assets. Post-apply validation script gated all DNS changes.

Database Migration & Collation Resolution Pipeline

Pre-migration audit inventoried 47 tables (11.2M rows, 183 FK constraints, 12 stored procedures). Python collation_rewriter.py converted all latin1 CREATE TABLE definitions and INSERT data to utf8mb4. Full import into RDS took 1h 47m — 100% row count match. 15-minute delta sync cron maintained near-realtime RDS freshness during the 48-hour validation window.

Cutover Execution & Rollback Plan

48-hour parallel validation compared response codes for 50 key URLs every 5 minutes — resolved 3 environment config differences before cutover. TTL reduced to 60 seconds 24 hours prior. Final delta sync completed with 0 rows behind. DNS records updated in Route 53 — cutover in 3 minutes. On-premise servers kept in read-only mode for 7 days as rollback option (documented 5-minute revert procedure).

Technical Skills

  • Terraform
  • Amazon EC2
  • Amazon RDS
  • Amazon VPC
  • MySQL
  • AWS Lambda
  • TCO Modeling
  • Cloud Migration Strategy
  • Bash Scripting

Engineering Challenges

  • MySQL Collation Mismatch — 14 tables with latin1_swedish_ci collation produced 'Illegal mix of collations' errors on JOIN queries against utf8mb4 tables. Fixed by building collation_rewriter.py to rewrite all CREATE TABLE and INSERT data in the dump to utf8mb4.
  • Rsync File Permission Corruption — Rsync ran as ubuntu user, transferring ownership away from www-data, breaking Laravel cache writes. Fixed by adding chown -R www-data to the EC2 UserData and ASG lifecycle hook Lambda.
  • RDS max_connections Exhaustion — PHP-FPM pm.max_children=20 × 3 connections = 65+ connections maxed the default 66 limit during load tests. Fixed by setting max_connections=200 in the custom parameter group.
  • Nginx PHP 8.2 Socket Path — Nginx config hardcoded the PHP 8.1 socket path, causing 500 errors after AMI upgrade to PHP 8.2. Fixed by dynamically resolving PHP version in the UserData script.
  • Direct Connect Bandwidth Contention — ERP replication consumed 600–700 Mbps during business hours, impacting Rsync. Fixed by scheduling transfers exclusively 22:00–06:00 UTC with --bwlimit=80000.
  • Laravel Queue Workers Not Starting on New ASG Instances — supervisord was disabled in the AMI. Fixed by adding systemctl enable supervisord to the AMI baking script.

Project Outcomes

  • Migrated 100% of services (Laravel, MySQL, 3 TB assets) with zero reported downtime — DNS cutover completed in under 3 minutes.
  • Application layer now scales from 1 to 6 instances on demand, handling 8× peak traffic vs the original single server.
  • RDS Multi-AZ replaced manual mysqldump-to-drive backups with automated 7-day retention and 38-second tested failover.
  • Full infrastructure reproducible via Terraform — staging environment provisionable in 14 minutes.
  • All 47 tables migrated to utf8mb4 — eliminated a latent encoding risk for international user data.
  • RDS connection pool exhaustion ('Too many connections') eliminated by increasing max_connections and implementing explicit per-worker pool management.