What breaks if we switch from REST to gRPC for all internal services?
Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 mo...)
Decision
Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 months), moderate-performance (Class B → hybrid REST/gRPC), and integration-heavy (Class C → remain REST for 12+ months), rather than switching all services to gRPC simultaneously.. Because a blanket REST-to-gRPC migration breaks browser client compatibility, eliminates HTTP caching infrastructure, disrupts debugging workflows (curl, Postman, browser DevTools), requires protobuf schema management overhead, and forces team reskilling simultaneously across all services — tiered classification isolates these breakage points to manageable batches while capturing performance gains where they matter most (services requiring <50ms response time).. Key failure modes: Inconsistent service boundaries causing increased cognitive load for developers maintaining both communication patterns; Premature optimization of low-traffic services consuming resources that could be allocated to actual performance bottlenecks; Misclassification of services leading to wrong protocol choice — e.g., a Class C service that actually has latency-sensitive internal callers. Thresholds: Response time < 50ms for Class A services, Class A migration within 6 months, Class C remains on REST for 12+ months
Next actions
What usually goes wrong
- Risk assessment focused on known threats, missed novel vectors
- Compliance checkbox passed but operational security remained weak
- Low-probability high-impact scenario treated as negligible
Council notes
Attack grid ⓘSurvival rate shows how the recommendation holds under stress scenarios. Low scores indicate conditional vulnerability, not a flaw in the recommendation.
Scenario detail (8)
Evidence boundary
Observed from your filing
- What breaks if we switch from REST to gRPC for all internal services?
Assumptions used for analysis
- The organization runs a microservices architecture with multiple internal services communicating synchronously over REST today
- There exist measurable performance differences between REST/JSON and gRPC/protobuf for the organization's actual payload sizes and call patterns
- The engineering team has capacity to maintain two communication paradigms simultaneously during a multi-month transition
- Service classification into A/B/C tiers can be done objectively based on measurable metrics rather than political negotiation
- The organization's infrastructure (load balancers, service mesh, API gateways, observability stack) can support gRPC — specifically HTTP/2 end-to-end
Inferred candidate specifics
- Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 months), moderate-performance (Class B → hybrid REST/gRPC), and integration-heavy (Class C → remain REST for 12+ months), rather than switching all services to gRPC simultaneously.. Because a blanket REST-to-gRPC migration breaks browser client compatibility, eliminates HTTP caching infrastructure, disrupts debugging workflows (curl, Postman, browser DevTools), requires protobuf schema management overhead, and forces team reskilling simultaneously across all services — tiered classification isolates these breakage points to manageable batches while capturing performance gains where they matter most (services requiring <50ms response time).. Key failure modes: Inconsistent service boundaries causing increased cognitive load for developers maintaining both communication patterns; Premature optimization of low-traffic services consuming resources that could be allocated to actual performance bottlenecks; Misclassification of services leading to wrong protocol choice — e.g., a Class C service that actually has latency-sensitive internal callers. Thresholds: Response time < 50ms for Class A services, Class A migration within 6 months, Class C remains on REST for 12+ months
- Create a service inventory spreadsheet cataloging all internal services, including relevant data such as current RPS, p99 latency, the number of consumer services, external integration dependencies, and current debugging/caching dependencies on REST semantics. Use this inventory to classify services into Class A, B, or C based on the tiered criteria and develop a concrete migration sequencing plan.
- b002 won by default as the only surviving implementation branch with a concrete recommendation. Its confidence (0.75) exceeded b006 (0.40), and b006 was structurally disqualified as a reframe that provided zero specific breakage points. However, b002's defense quality was low (0.40) and it was correctly criticized for not directly answering 'what breaks.' The killed b003 had stronger technical specifics and higher original confidence, but was structurally disqualified in round 4 for providing a migration plan instead of a breakage analysis. This tension — the best technical branch was killed while the survivor is adequate but underspecified — is reflected in the moderate confidence score.
- Build a service inventory with measured RPS, p99 latency, consumer count, and REST-specific dependencies (caching, debugging tools, load balancer configs) for every internal service
- Define quantitative classification criteria for Class A/B/C based on the inventory data — specific RPS thresholds, latency requirements, and external integration counts
- Run a proof-of-concept gRPC migration on one Class A service with Envoy transcoding sidecar, measuring actual p99 latency improvement and developer onboarding time
- Set up a shared protobuf registry (Buf Schema Registry) with CI-enforced breaking-change detection before any service begins migration
- Track developer cognitive load metrics (context-switch frequency, incident rate per protocol type, onboarding time for new team members) throughout migration to detect if dual-paradigm maintenance is degrading velocity
Unknowns blocking a firmer verdict
- The winning branch (b002) was critiqued for not directly inventorying what breaks — it focuses on migration strategy rather than a comprehensive breakage catalog. The killed b003 branch had significantly more specific technical failure modes (proto schema corruption, transcoding latency accumulation) that the winner lacks.
- No branch provided a complete 'what breaks' inventory covering all dimensions: load balancer reconfiguration, observability pipeline changes, testing tool replacement, CI/CD pipeline modifications, service mesh compatibility, and team skill gaps.
- The <50ms threshold for Class A services and the 6/12-month timelines are synthetic — no branch grounded these numbers in measured system data or named engineering heuristics.
- Verdict is largely model-reasoning only — the 3 evidence items (quality mean=1.00) all mapped to b003 which was killed. The surviving winner has no external evidence support.
- REST+HTTP/3 optimization (b006's point) was not seriously evaluated against gRPC for internal services — this remains a legitimate unexplored alternative that could change the recommendation if benchmarked.
Fragility signals
- Hubris: ANNOTATE
Operational signals to watch
Flip conditions
Branch battle map
Battle timeline (4 rounds)
Minority report
What if the opposite were true? What *improves* if we optimize REST with HTTP/3, compression, and JSON Schema instead of chasing gRPC? Both branches fixate on gRPC's hype while ignoring REST's maturity in caching, idempotency, and ecosystem tooling.
Pre-mortem (3 scenarios)
Censor oversight
REOPEN SPAR
The winning decision (b003) provides a detailed migration plan but fails to directly address the original question 'what breaks'. It also assumes certain expertise and doesn't scope infrastructure coupling. Surviving branch b002 offers a more nuanced approach that was not selected despite higher confidence in some model outputs.
Structural issues
- SELECTION MISMATCH: b002 provides a reasonable classification framework and polyglot persistence approach, which is more nuanced than b003's blanket gRPC migration
- CONSULTING FOG: The winning decision describes a migration plan but doesn't directly address 'what breaks' when switching to gRPC