The Context
Our ad serving engine was written in PHP with Laravel. It worked, but it was hitting a performance ceiling that no amount of optimization was going to fix. The team wanted to rewrite it in Go.
Rewrites are notoriously risky. The second-system effect is real. I'd seen rewrites fail before — months of work, then a rollback to the original system.
The Strategy: Strangler Fig
Instead of a big-bang rewrite, we used the strangler fig pattern. We built the new Go service alongside the existing PHP service and gradually shifted traffic.
Phase 1: Build the Go service with feature parity. No new features. Just make it work.
Phase 2: Shadow mode. The Go service receives all traffic but its responses are discarded. We compare outputs against PHP to find discrepancies.
Phase 3: 1% → 5% → 25% → 50% → 100% traffic migration over 8 weeks.
The Technical Challenges
Behavioral parity is harder than it sounds. PHP's type coercion and floating-point handling produced subtly different results in bid calculations. We found 3 edge cases in shadow mode that would have caused billing discrepancies.
State migration. The PHP service had accumulated years of implicit state in the database schema. We had to make the Go service compatible with this schema while also cleaning it up.
The Leadership Challenges
Keeping the team motivated during a long migration. Rewrites feel like running in place — you're working hard but not shipping features. I kept the team focused on the metrics: latency, throughput, cost. Watching those numbers improve was motivating.
Managing stakeholder expectations. Product wanted new features. I had to hold the line on "no new features until migration is complete" for 6 months. This required trust from leadership.
Handling the moment when things go wrong. At 25% traffic, we had a latency spike. The instinct is to roll back immediately. Instead, we investigated, found a connection pool misconfiguration, fixed it in 20 minutes, and continued. Building that confidence in the team was worth more than the 20 minutes of elevated latency.
What I'd Do Differently
Start the shadow mode comparison earlier. We started it at 80% feature parity. Starting at 50% would have caught the edge cases sooner and given us more time to fix them.