What breaks if we switch from REST to gRPC for all internal services? We have 12 microservices, 4 backend engineers, browser clients that need direct API access, and a 3-month migration window.
Maintain the current REST architecture for all 12 internal services and do NOT attempt a full gRPC migration within...
Decision
Do not switch from REST to gRPC for all internal services within the 3-month window. The core constraint is team capacity: 4 engineers migrating 12 services means each engineer owns ~3 service migrations simultaneously while maintaining existing functionality and browser client access. Browser clients cannot speak native gRPC, requiring a gRPC-Web proxy (Envoy or grpc-web) — an additional infrastructure component that must be deployed, monitored, and debugged. The critical failure mode is partial migration: services straddling both protocols double the integration surface, create contract mismatches, and force engineers to maintain two serialization paths indefinitely. If gRPC performance benefits are needed, pursue a phased approach after the 3-month window — start with a single high-traffic internal-only service pair, validate the toolchain (protobuf schema management, gRPC-Web proxy, observability integration), then expand incrementally with rollback capability at each step.
Next actions
Council notes
Evidence boundary
Observed from your filing
- What breaks if we switch from REST to gRPC for all internal services? We have 12 microservices, 4 backend engineers, browser clients that need
- direct API access, and a 3-month migration window.
Assumptions used for analysis
- Current REST-based architecture is not experiencing measurable latency or throughput failures that block business objectives
- The 4 backend engineers must continue feature development alongside any migration work — migration is not their sole focus
- Browser clients require direct API access without additional proxy infrastructure being acceptable
- The 12 microservices have existing REST contracts and integration tests that would need parallel gRPC equivalents during migration
- No organizational mandate or external dependency forcing gRPC adoption on a fixed timeline
- deployment model defaulted: not specified (not_addressed)
- observability state defaulted: not specified (not_addressed)
Inferred candidate specifics
- Do not switch from REST to gRPC for all internal services within the 3-month window. The core constraint is team capacity: 4 engineers migrating 12 services means each engineer owns ~3 service migrations simultaneously while maintaining existing functionality and browser client access. Browser clients cannot speak native gRPC, requiring a gRPC-Web proxy (Envoy or grpc-web) — an additional infrastructure component that must be deployed, monitored, and debugged. The critical failure mode is partial migration: services straddling both protocols double the integration surface, create contract mismatches, and force engineers to maintain two serialization paths indefinitely. If gRPC performance benefits are needed, pursue a phased approach after the 3-month window — start with a single high-traffic internal-only service pair, validate the toolchain (protobuf schema management, gRPC-Web proxy, observability integration), then expand incrementally with rollback capability at each step.
- Run a latency and payload size audit across all 12 services' inter-service REST calls for one week to identify if any specific service pair has serialization or throughput bottlenecks that would justify a targeted gRPC pilot after the 3-month window.
- b001 and b002 both had confidence 0.85 and converged on the same recommendation. b001 was selected as the winner because both models (glm and gpt) strengthened it across rounds, making it the most battle-tested branch. b002 added the useful nuance of a phased future evaluation, which is incorporated into the chosen path. b004 at 0.40 confidence provided a valid reframe but lacked specifics on migration mechanics and understated gRPC-Web proxy complexity.
- Instrument all inter-service REST calls with P50/P95/P99 latency and payload size metrics for 1 week to establish baseline performance
- Identify the single highest-traffic internal-only service pair as a candidate for a future gRPC pilot
- After the 3-month window, use baseline metrics to decide whether a single-pair gRPC pilot is justified based on measured latency/throughput constraints
- Measured P99 inter-service latency exceeds SLA thresholds AND profiling shows JSON serialization is the dominant contributor -> Targeted gRPC migration for the specific high-latency service pairs, starting with internal-only services that don't serve browser clients
- Team grows to 8+ backend engineers AND migration window extends to 6+ months AND browser clients move behind a BFF/API gateway -> Full phased gRPC migration becomes feasible with dedicated migration squad and rollback plan per service
Unknowns blocking a firmer verdict
- Whether any of the 12 services have latency or throughput problems that REST is actively causing — if so, selective gRPC adoption for those specific service pairs could be justified
- Whether the team has any existing protobuf/gRPC experience — zero experience would make even a phased migration slower than estimated
- Whether browser clients require real-time streaming patterns that would benefit from gRPC-Web's bidirectional streaming over REST polling/SSE