Why gRPC Uses HTTP2

When Google introduced gRPC in 2015, one of the most significant architectural decisions was building it on top of HTTP/2 rather than the widely adopted HTTP/1.1. This was just about following some trend, but a deliberate choice that fundamentally shapes how gRPC performs and behaves. Let’s dig deeper into why this decision was made and how it impacts real-world applications.

What Makes HTTP/2 Different

Before diving into gRPC’s specific needs, let’s understand what HTTP/2 brings to the table that HTTP/1.1 doesn’t.

Binary vs Text Protocol

HTTP/1.1 is a text-based protocol. When you make a request, it looks something like this:

GET /api/users HTTP/1.1
Host: example.com
Accept: application/json

HTTP/2, on the other hand, is a binary protocol. The same request is encoded into binary frames that are more compact and faster to parse. This improves performance, especially when dealing with thousands of concurrent connections (such as in a microservices architecture).

Multiplexing

The most interesting and important feature of HTTP/2 is multiplexing. In HTTP/1.1, each TCP connection can handle only one request at a time. If you want to make multiple requests, you either have to:

Wait for each request to complete sequentially
Use HTTP pipelining
Open multiple TCP connections (which browsers limit to 6-8 per domain)

HTTP/2 allows multiple requests and responses to be interleaved over a single TCP connection using streams. Each stream has a unique ID, and frames can be sent for different streams without blocking each other.

                    Single TCP Connection
                    ═══════════════════════════════
Stream 1 (ID: 1):   [REQ A] -----> [RESP A]
                      |              |
                      t1            t3

Stream 2 (ID: 3):         [REQ B] -----> [RESP B]
                            |              |
                            t1.5          t2.5

Stream 3 (ID: 5):               [REQ C] -----> [RESP C]
                                  |              |
                                  t2            t4

This is how the timeline view of multiplexed HTTP/2 requests looks.

t1    t1.5   t2    t2.5   t3    t4
|     |      |     |      |     |
▼     ▼      ▼     ▼      ▼     ▼
┌─────┬──────┬─────┬──────┬─────┬─────┐
│REQ A│REQ B │REQ C│RESP B│RESP │RESP │
│(S:1)│(S:3) │(S:5)│(S:3) │A    │C    │
│     │      │     │      │(S:1)│(S:5)│
└─────┴──────┴─────┴──────┴─────┴─────┘

On the wire, this multiplexing looks something like this.

[H S:1][D S:1][H S:3][D S:3][H S:5][D S:5][D S:3][D S:1][D S:5]
REQ A   REQ A  REQ B  REQ B  REQ C  REQ C  RESP B RESP A RESP C

S:1, S:3, S:5 = Stream IDs
REQ = Request frames
RESP = Response frames

gRPC’s Core Requirements

Bidirectional Communication

gRPC supports four types of RPC calls:

Unary: Traditional request-response
Server streaming: Server sends multiple responses
Client streaming: Client sends multiple requests
Bidirectional streaming: Both sides stream data simultaneously

HTTP/1.1 simply cannot handle streaming scenarios effectively. While techniques like Server-Sent Events (SSE) or WebSockets exist, they’re either limited or require protocol upgrades that break the HTTP model.

HTTP/2’s stream-based architecture naturally supports these patterns. A bidirectional streaming gRPC call maps perfectly to an HTTP/2 stream where both client and server can send frames asynchronously.

Here’s a practical example…

Imagine a chat application where multiple users are sending messages simultaneously:

service ChatService {
  rpc LiveChat(stream ChatMessage) returns (stream ChatMessage);
}

With HTTP/2, this becomes a single stream where:

Client sends DATA frames containing serialized ChatMessage
Server sends DATA frames back with messages from other users
All happens over one TCP connection with proper flow control

High Throughput and Low Latency

In microservices architectures, services often make dozens of calls to other services to fulfill a single user request. With HTTP/1.1, this creates a bottleneck:

Service A → Service B (wait)
          → Service C (wait)
          → Service D (wait)

Even with connection pooling, you’re limited by the number of concurrent connections and the head-of-line blocking problem.

With gRPC over HTTP/2, Service A can make all these calls concurrently over a single connection:

Service A → Multiple streams to Services B, C, and D simultaneously

The multiplexing ensures that a slow response from Service B doesn’t block the faster responses from Services C and D.

Efficient Header Handling

gRPC makes extensive use of headers for metadata like authentication tokens, tracing information, and custom headers. In a typical microservices call chain, these headers are propagated through multiple services.

HTTP/1.1 sends headers as plain text with every request:

Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
X-Trace-ID: 1234567890abcdef
X-User-ID: user123
Content-Type: application/grpc+proto

In a service mesh with hundreds of requests per second, this overhead becomes significant.

HTTP/2’s HPACK compression maintains a dynamic table of previously seen headers. After the first request, common headers are referenced by index rather than sent in full.

:method: POST               → Index 3
:path: /UserService/GetUser → Index 62
authorization: Bearer...    → Index 58

This reduces header overhead by 85-90% in real applications.

Performance

Connection Management

Consider a microservices application with 10 services, each making an average of 5 calls to other services under load.

HTTP/1.1 Scenario:

Each service needs connection pools to every other service
With 6 connections per pool, that’s 10 × 9 × 6 = 540 TCP connections
Each connection has TCP overhead, OS socket limits, and connection establishment latency

HTTP/2 Scenario:

Each service maintains 1-2 connections to every other service
Total connections: 10 × 9 × 2 = 180 TCP connections
67% reduction in connection overhead

Latency Characteristics

In practice, the latency benefits of HTTP/2 for gRPC are most noticeable in:

High-frequency, low-payload requests: Microservices often make many small calls. HTTP/2’s frame overhead is lower than HTTP/1.1’s text parsing.
Concurrent requests: When a service needs to aggregate data from multiple sources, HTTP/2’s multiplexing provides a significant speedup.
Long-lived connections: gRPC services maintain persistent connections for streaming. HTTP/2’s connection reuse is more efficient than HTTP/1.1’s connection establishment overhead.

Real-world numbers that I have observed

20-40% latency reduction for concurrent requests
50-80% reduction in connection overhead
2-3x improvement in requests per second for small payloads

Streaming

Server Streaming Example

Consider a log streaming service:

service LogService {
  rpc StreamLogs(LogRequest) returns (stream LogEntry);
}

HTTP/1.1 Approach:

Long polling with timeouts
Chunked transfer encoding
Complex client-side reconnection logic requiring state management

HTTP/2 Approach:

Natural streaming with DATA frames
Built-in flow control via WINDOW_UPDATE(explained below) frames
Clean connection management

The HTTP/2 implementation is not only simpler but also more robust and efficient.

Flow Control in Action

HTTP/2’s flow control prevents fast producers from overwhelming slow consumers. In gRPC streaming:

Client opens stream with initial window size (65KB default)
Server sends data frames up to the window limit
Client processes data and sends a WINDOW_UPDATE to increase the available window
Server continues sending based on the updated window

The window size controls how much unacknowledged data can be “in flight” between sender and receiver at any given moment. Think of it as a buffering limit, not a message size limit.

This prevents memory exhaustion and provides natural backpressure, something that’s difficult to achieve cleanly with HTTP/1.1. Also, both client and server can send a WINDOW_UPDATE frame depending on their rate of consumption and production.

Trade-offs

When HTTP/2 Might Not Be Ideal

Single Request Scenarios - For simple, one-off requests, HTTP/1.1 might have lower latency due to:

Simpler protocol negotiation
Less connection setup overhead
Broader proxy support (though this is diminishing)

Resource-Constrained Environments - HTTP/2 requires more memory for:

HPACK compression tables
Stream state management
Flow control windows

In embedded systems or extremely memory-constrained environments, this overhead might be significant.

Proxy and Infrastructure Considerations

Load Balancer Compatibility - Not all load balancers handle HTTP/2 efficiently:

Some terminate HTTP/2 and forward as HTTP/1.1
Others don’t properly handle gRPC’s use of HTTP/2 trailers
Stream-aware load balancing is still evolving

Debugging Complexity - HTTP/2’s binary nature makes debugging more challenging:

Network captures require specialized tools
Stream interleaving makes request/response correlation complex
Traditional HTTP debugging tools may not work

Protocol Buffer Integration

And of course, talking about the most common - protobuf. gRPC’s use of Protocol Buffers pairs exceptionally well with HTTP/2

Protobuf’s binary serialization is naturally aligned with HTTP/2’s binary frames:

No text-to-binary conversion overhead
Efficient frame packing
Better compression ratios when combined with HTTP/2’s HPACK

Schema Evolution

Protobuf’s schema evolution capabilities work well with HTTP/2’s header compression:

Service versions can be negotiated via headers
Backward compatibility metadata travels efficiently
Feature flags and capabilities can be communicated compactly

Footnotes

gRPC’s adoption of HTTP/2 was a strategic decision that enables the framework’s core value propositions - performance, functionality, and scalability.

If you are building distributed systems, understanding this relationship between gRPC and HTTP/2 is crucial. It helps you build performant production environments.

As you design and implement gRPC services, keep these underlying mechanics in mind – they will inform your decisions about service boundaries, streaming strategies, and performance optimization.

The key is understanding how to leverage these capabilities effectively in your specific use case.