When Google introduced gRPC in 2015, one of the most significant architectural decisions was building it on top of HTTP/2 rather than the widely adopted HTTP/1.1. This was just about following some trend, but a deliberate choice that fundamentally shapes how gRPC performs and behaves. Let’s dig deeper into why this decision was made and how it impacts real-world applications.
What Makes HTTP/2 Different
Before diving into gRPC’s specific needs, let’s understand what HTTP/2 brings to the table that HTTP/1.1 doesn’t.
Binary vs Text Protocol
HTTP/1.1 is a text-based protocol. When you make a request, it looks something like this:
GET /api/users HTTP/1.1
Host: example.com
Accept: application/json
HTTP/2, on the other hand, is a binary protocol. The same request is encoded into binary frames that are more compact and faster to parse. This improves performance, especially when dealing with thousands of concurrent connections (such as in a microservices architecture).
Multiplexing
The most interesting and important feature of HTTP/2 is multiplexing. In HTTP/1.1, each TCP connection can handle only one request at a time. If you want to make multiple requests, you either have to:
- Wait for each request to complete sequentially
- Use HTTP pipelining
- Open multiple TCP connections (which browsers limit to 6-8 per domain)
HTTP/2 allows multiple requests and responses to be interleaved over a single TCP connection using streams. Each stream has a unique ID, and frames can be sent for different streams without blocking each other.
Single TCP Connection
═══════════════════════════════
Stream 1 (ID: 1): [REQ A] -----> [RESP A]
| |
t1 t3
Stream 2 (ID: 3): [REQ B] -----> [RESP B]
| |
t1.5 t2.5
Stream 3 (ID: 5): [REQ C] -----> [RESP C]
| |
t2 t4
This is how the timeline view of multiplexed HTTP/2 requests looks.
t1 t1.5 t2 t2.5 t3 t4
| | | | | |
▼ ▼ ▼ ▼ ▼ ▼
┌─────┬──────┬─────┬──────┬─────┬─────┐
│REQ A│REQ B │REQ C│RESP B│RESP │RESP │
│(S:1)│(S:3) │(S:5)│(S:3) │A │C │
│ │ │ │ │(S:1)│(S:5)│
└─────┴──────┴─────┴──────┴─────┴─────┘
On the wire, this multiplexing looks something like this.
[H S:1][D S:1][H S:3][D S:3][H S:5][D S:5][D S:3][D S:1][D S:5]
REQ A REQ A REQ B REQ B REQ C REQ C RESP B RESP A RESP C
S:1, S:3, S:5 = Stream IDs
REQ = Request frames
RESP = Response frames
gRPC’s Core Requirements
Bidirectional Communication
gRPC supports four types of RPC calls:
- Unary: Traditional request-response
- Server streaming: Server sends multiple responses
- Client streaming: Client sends multiple requests
- Bidirectional streaming: Both sides stream data simultaneously
HTTP/1.1 simply cannot handle streaming scenarios effectively. While techniques like Server-Sent Events (SSE) or WebSockets exist, they’re either limited or require protocol upgrades that break the HTTP model.
HTTP/2’s stream-based architecture naturally supports these patterns. A bidirectional streaming gRPC call maps perfectly to an HTTP/2 stream where both client and server can send frames asynchronously.
Here’s a practical example…
Imagine a chat application where multiple users are sending messages simultaneously:
service ChatService {
rpc LiveChat(stream ChatMessage) returns (stream ChatMessage);
}
With HTTP/2, this becomes a single stream where:
- Client sends DATA frames containing serialized ChatMessage
- Server sends DATA frames back with messages from other users
- All happens over one TCP connection with proper flow control
High Throughput and Low Latency
In microservices architectures, services often make dozens of calls to other services to fulfill a single user request. With HTTP/1.1, this creates a bottleneck:
Service A → Service B (wait)
→ Service C (wait)
→ Service D (wait)
Even with connection pooling, you’re limited by the number of concurrent connections and the head-of-line blocking problem.
With gRPC over HTTP/2, Service A can make all these calls concurrently over a single connection:
Service A → Multiple streams to Services B, C, and D simultaneously
The multiplexing ensures that a slow response from Service B doesn’t block the faster responses from Services C and D.
Efficient Header Handling
gRPC makes extensive use of headers for metadata like authentication tokens, tracing information, and custom headers. In a typical microservices call chain, these headers are propagated through multiple services.
HTTP/1.1 sends headers as plain text with every request:
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
X-Trace-ID: 1234567890abcdef
X-User-ID: user123
Content-Type: application/grpc+proto
In a service mesh with hundreds of requests per second, this overhead becomes significant.
HTTP/2’s HPACK compression maintains a dynamic table of previously seen headers. After the first request, common headers are referenced by index rather than sent in full.
:method: POST → Index 3
:path: /UserService/GetUser → Index 62
authorization: Bearer... → Index 58
This reduces header overhead by 85-90% in real applications.
Performance
Connection Management
Consider a microservices application with 10 services, each making an average of 5 calls to other services under load.
HTTP/1.1 Scenario:
- Each service needs connection pools to every other service
- With 6 connections per pool, that’s 10 × 9 × 6 = 540 TCP connections
- Each connection has TCP overhead, OS socket limits, and connection establishment latency
HTTP/2 Scenario:
- Each service maintains 1-2 connections to every other service
- Total connections: 10 × 9 × 2 = 180 TCP connections
- 67% reduction in connection overhead
Latency Characteristics
In practice, the latency benefits of HTTP/2 for gRPC are most noticeable in:
- High-frequency, low-payload requests: Microservices often make many small calls. HTTP/2’s frame overhead is lower than HTTP/1.1’s text parsing.
- Concurrent requests: When a service needs to aggregate data from multiple sources, HTTP/2’s multiplexing provides a significant speedup.
- Long-lived connections: gRPC services maintain persistent connections for streaming. HTTP/2’s connection reuse is more efficient than HTTP/1.1’s connection establishment overhead.
Real-world numbers that I have observed
- 20-40% latency reduction for concurrent requests
- 50-80% reduction in connection overhead
- 2-3x improvement in requests per second for small payloads
Streaming
Server Streaming Example
Consider a log streaming service:
service LogService {
rpc StreamLogs(LogRequest) returns (stream LogEntry);
}
HTTP/1.1 Approach:
- Long polling with timeouts
- Chunked transfer encoding
- Complex client-side reconnection logic requiring state management
HTTP/2 Approach:
- Natural streaming with DATA frames
- Built-in flow control via WINDOW_UPDATE(explained below) frames
- Clean connection management
The HTTP/2 implementation is not only simpler but also more robust and efficient.
Flow Control in Action
HTTP/2’s flow control prevents fast producers from overwhelming slow consumers. In gRPC streaming:
- Client opens stream with initial window size (65KB default)
- Server sends data frames up to the window limit
- Client processes data and sends a WINDOW_UPDATE to increase the available window
- Server continues sending based on the updated window
The window size controls how much unacknowledged data can be “in flight” between sender and receiver at any given moment. Think of it as a buffering limit, not a message size limit.
This prevents memory exhaustion and provides natural backpressure, something that’s difficult to achieve cleanly with HTTP/1.1. Also, both client and server can send a WINDOW_UPDATE frame depending on their rate of consumption and production.
Trade-offs
When HTTP/2 Might Not Be Ideal
Single Request Scenarios - For simple, one-off requests, HTTP/1.1 might have lower latency due to:
- Simpler protocol negotiation
- Less connection setup overhead
- Broader proxy support (though this is diminishing)
Resource-Constrained Environments - HTTP/2 requires more memory for:
- HPACK compression tables
- Stream state management
- Flow control windows
In embedded systems or extremely memory-constrained environments, this overhead might be significant.
Proxy and Infrastructure Considerations
Load Balancer Compatibility - Not all load balancers handle HTTP/2 efficiently:
- Some terminate HTTP/2 and forward as HTTP/1.1
- Others don’t properly handle gRPC’s use of HTTP/2 trailers
- Stream-aware load balancing is still evolving
Debugging Complexity - HTTP/2’s binary nature makes debugging more challenging:
- Network captures require specialized tools
- Stream interleaving makes request/response correlation complex
- Traditional HTTP debugging tools may not work
Protocol Buffer Integration
And of course, talking about the most common - protobuf. gRPC’s use of Protocol Buffers pairs exceptionally well with HTTP/2
Protobuf’s binary serialization is naturally aligned with HTTP/2’s binary frames:
- No text-to-binary conversion overhead
- Efficient frame packing
- Better compression ratios when combined with HTTP/2’s HPACK
Schema Evolution
Protobuf’s schema evolution capabilities work well with HTTP/2’s header compression:
- Service versions can be negotiated via headers
- Backward compatibility metadata travels efficiently
- Feature flags and capabilities can be communicated compactly
Footnotes
gRPC’s adoption of HTTP/2 was a strategic decision that enables the framework’s core value propositions - performance, functionality, and scalability.
If you are building distributed systems, understanding this relationship between gRPC and HTTP/2 is crucial. It helps you build performant production environments.
As you design and implement gRPC services, keep these underlying mechanics in mind – they will inform your decisions about service boundaries, streaming strategies, and performance optimization.
The key is understanding how to leverage these capabilities effectively in your specific use case.