AWS Lambda vs Cloudflare Workers for a latency-sensitive API gateway. 200M requests/month, P99 target <50ms, currently on ECS Fargate.
Migrate the API gateway from ECS Fargate to Cloudflare Workers using an edge-compute + origin-fetch pattern to...
Decision
Deploy Cloudflare Workers as the API gateway using edge-compute + origin-fetch to existing AWS backends. Workers V8 isolates have zero cold starts (sub-1ms spin-up), edge execution adds <5ms, leaving ~45ms for origin fetch — US-East RTT of 10-25ms fits comfortably. Total cost: ~$500/month versus $6K Fargate (94% savings). Lambda is disqualified: Provisioned Concurrency for 770 peak concurrent instances costs $11K-13K/month (40-60% over $8K ceiling), and without it, Lambda P99 is 800ms — 16x the target. Migrate via Cloudflare DNS with percentage-based traffic splitting (10% → 50% → 100% over 5 days), keeping Fargate as fallback. Critical failure mode: if the gateway calls AWS VPC-internal services, cross-cloud RTT of 15-30ms leaves <20ms for backend processing. Mitigate with Cloudflare Tunnel or public endpoints, and validate backend processing time fits within the remaining budget before full cutover.
Next actions
Council notes
Evidence boundary
Observed from your filing
- AWS Lambda vs Cloudflare Workers for a latency-sensitive API gateway. 200M requests/month, P99 target <50ms, currently on ECS Fargate.
Assumptions used for analysis
- The API gateway primarily performs routing, auth, and lightweight transformation — not compute-heavy processing that would exhaust Workers CPU limits
- Backend services are accessible via public endpoints or can be exposed via Cloudflare Tunnel without prohibitive latency
- Traffic is predominantly US-centric, making the 10-25ms RTT estimate to us-east-1 representative
- The $8K/month cost ceiling is a hard constraint that disqualifies Lambda Provisioned Concurrency
- Current ECS Fargate can remain operational as a fallback during the 5-day migration window
- team size defaulted: standard team (5-10 engineers) (not_addressed)
Inferred candidate specifics
- Deploy Cloudflare Workers as the API gateway using edge-compute + origin-fetch to existing AWS backends. Workers V8 isolates have zero cold starts (sub-1ms spin-up), edge execution adds <5ms, leaving ~45ms for origin fetch — US-East RTT of 10-25ms fits comfortably. Total cost: ~$500/month versus $6K Fargate (94% savings). Lambda is disqualified: Provisioned Concurrency for 770 peak concurrent instances costs $11K-13K/month (40-60% over $8K ceiling), and without it, Lambda P99 is 800ms — 16x the target. Migrate via Cloudflare DNS with percentage-based traffic splitting (10% → 50% → 100% over 5 days), keeping Fargate as fallback. Critical failure mode: if the gateway calls AWS VPC-internal services, cross-cloud RTT of 15-30ms leaves <20ms for backend processing. Mitigate with Cloudflare Tunnel or public endpoints, and validate backend processing time fits within the remaining budget before full cutover.
- Deploy a Cloudflare Workers proof-of-concept that proxies 3-5 representative API routes to the existing Fargate ALB, measure P99 end-to-end latency including origin fetch under synthetic load matching peak traffic patterns (770 concurrent requests), and validate that cross-cloud RTT + backend processing fits within the 50ms budget.
- Branch b003 had the highest confidence (0.82), survived 3 rounds of adversarial review including strengthening in rounds 1 and 2, named specific cost thresholds ($500/month vs $11-13K Lambda), specific latency breakdowns (sub-1ms isolate, <5ms edge, 15-30ms cross-cloud RTT), concrete failure modes (VPC access, backend processing budget), and a specific migration timeline. Reframe branches b006 and b007 raised valid strategic considerations but neither provided actionable recommendations.
- Build a Cloudflare Workers PoC proxying 3-5 high-traffic API routes to the existing Fargate ALB using fetch() with standard Web API patterns
- Run synthetic load tests at 770 concurrent requests from multiple geographic regions, measuring P99 end-to-end latency including origin fetch to AWS us-east-1
- Audit all API gateway routes for AWS VPC-private service dependencies (DynamoDB, ElastiCache, SQS) and determine if Cloudflare Tunnel or public endpoints are needed
- Configure Cloudflare DNS percentage-based traffic splitting starting at 10%, with automated rollback to Fargate ALB if P99 exceeds 50ms
- Set up unified observability pipeline (OpenTelemetry) spanning Workers analytics and CloudWatch to track P99 latency, error rates, and cost per request across both platforms during migration
Unknowns blocking a firmer verdict
- Actual backend processing time for AWS-internal services is unknown — if complex queries take >20ms, the 50ms P99 budget may be violated when combined with 15-30ms cross-cloud RTT
- Whether the API gateway requires access to AWS VPC-private services (DynamoDB, ElastiCache, SQS) is unstated — this determines whether Cloudflare Tunnel or public endpoints are needed, adding latency and complexity
- The $500/month Workers cost estimate depends on CPU duration staying low; compute-heavy gateway logic (auth, transformation, validation) could push Workers Unbound costs higher
- Geographic distribution of users is unknown — the 10-25ms RTT estimate assumes US-centric traffic hitting US edge nodes; global traffic patterns could differ
- Cost numbers for Lambda Provisioned Concurrency are model-estimated, not sourced from AWS pricing calculator with the specific runtime/memory configuration