Almost every Go programmer will be faced with the necessity of using an RPC1 system. Whether writing a backend service for a web app, a P2P network or an IoT cloud, determining the communication protocol (and hence an RPC system) will be one of its requirements.
Most of the time, that requirement will be predetermined based on either the execution or organizational environment: for example, in the case of backend systems for a browser-based web app, the most natural combination will be to use JSON-encoded messages delivered through HTTPS.
When given the freedom to choose which system to use though, an engineer can specify the requirements for this communication system and its allowable trade-offs and can then pick a more appropriate solution: one which maximizes metrics other than “it’s the simplest one available to use”.
A few of the features that are relevant for this decision include:
- Is the protocol a reasonably popular standard? An open, but little used implementation? A private, licensed solution?
- Are our developers familiar with the features and limitations of this protocol?
- Will some form of support be found, both for initial development and ongoing maintenance?
- Is the performance envelope, given our target application, achievable using this system?
The first few questions are subjective, and each team needs to evaluate them based on the context of the upcoming job and their own experiences. For the rest of this post, I’ll focus on the last question: evaluating the performance of existing Go RPC libraries.
Obligatory disclaimer
Benchmarks are hard to get right: the benchmark’s harness, compiler artefacts, the broader computing enviroment (idleness of the CPU, already cached contents, OS configuration kbos) make it hard to create and interpret the results of a benchmark.
Additionally, every communication system may be configured in various ways, such as enabling or not compression, using different serialization formats (text or binary), enabling encryption or not, using a different transport protocol (TCP vs UDP) and so on.
Thus, while the provided gorpcbench provides a foundation for evaluating the performance of various RPC systems in a structured way, it should not be the sole criteria used for choosing the solution to be applied on any given project.
Existing evaluations
While a few comparisons exist for select RPC systems in Go (such as the ones from Pliutau and kmcd), they have a limited scope: they are either focusing on a small set of RPC systems and aren’t particularly generic and structured enough to be expandable to others.
Inspired by GoserBench which despite its limited testing scope, has a large number of targets implemented, I wanted to write a similar benchmarking framework for RPC systems available in Go.
Choosing the metrics
What are the important performance metrics of a generic RPC implementation? Within the context of Go clientd and servers, we’re interested in knowing:
- The latency distribution imposed by the RPC library to individual calls.
- The memory allocation profile for different types of calls (different argument and return types).
- The maximum throughput achievable.
Metric #1 will be most important when choosing a solution for low-latency applications (realtime trade systems, voice and video communication protocols, etc).
Metric #2 is important for determining hardware requirements, in particular for dynamic load scenarios.
Metric #3 establishes upper performance bounds for scenarios that involve bulk data transfer (such as downloads, video streams).
Designing the benchmark
Given a standard RPC flow of:
- Client prepares a request (fills data structures, serializes them, applies compression, etc).
- Client sends request through the network.
- Server deserializes the request.
- Server fulfills the actual request by preparing a response data structure.
- Server serializes and sends the response back to the client.
- Client deserializes the response.
- Client processes the response’s contents.
We need to design an interface that each system will fulfill. An abridged version of this interface is here (full version available in the repo):
// Server is the interface to a server implementation.
type Server interface {
Run(context.Context) error
}
// Client is the interface to an RPC client with specific functions.
type Client interface {
// Nop is a no-op call. It is used to measure the minimum latency overhead
// imposed by the RPC subsystem to calls.
Nop(context.Context) error
// ... Other calls
}
// RPCFactory is the interface to a test RPC system.
type RPCFactory interface {
// NewServer should create a new server, bound to the given network
// listener.
NewServer(net.Listener) (Server, error)
// NewClient should create a new client that connects to the given server address.
NewClient(_ context.Context, serverAddr string) (Client, error)
}
The general idea is that each benchmarking session will create one Server
instance, bound to a network address, one or more Client
instances directed towards the previously created server and will then perform various calls (such as Nop()
) to measure the system’s behavior.
Different calls added to the Client
interface will attempt to exercise different applications of the entire stack, such as serialization overhead and bulk transfer throughput.
The benchmarking harness will have the job of performing this setup and teardown, including running a particular set of calls.
Evaluating complex-arg’d calls
Ensuring that every RPC system has a fair shot on every call is tricky in some cases. Consider a hypothetical DoWork()
call:
interface Client {
DoWork(context.Context, InputWork) (OutputWork, error)
}
How should InputWork
and OutputWork
be defined? Standard structs? Interfaces?
When writing the implementation for a library that uses code generation to define its types (such as gRPC), a call that involves a complex data structure will incur some performance penalty copying from the argument struct and then back to the result struct.
Consider, however, that in a production system, the input type will be of the native one for that RPC system (because the user is creating the request itself) and the response object will be used directly.
When compared to a library that uses specific types, a streaming or reflection-based implementation may be able to use the argument and response without going through an intermediary struct. While its often the case that the serialization will not be a latency bottleneck when compared to other parts of the system, it would be better to attempt a fairer comparison.
In gorpcbench, we define methods that require complex structures using the following rough template:
interface InputWork{
SetValue(any)
}
interface OutputWork{
Value() any
}
interface Client {
DoWork(_ context.Context, fillArgs func(InputWork) ) (OutputWork, error)
}
The key here is the fillArgs
function: before performing the actual call, a client implementing DoWork
must call fillArgs()
, passing its preferred implementation of the InputWork
interface.
This ensures every implementation will have to provide at least one structure for filling the call parameters; libraries that serialize directly from that struct (such as gRPC or CapNProto) will then proceed to their serialization stage, while a streaming implementation (such as the hand-written raw tcp implementation) that only read from a struct will have the cost to build that struct also counted towards them (for a fairer comparison).
Results
The latest results can be found at the gorpcbench repository.
Future Work
The existing RPC benchmark infrastructure can already provide useful insights for comparing one of the most popular RPC libraries in Go (gRPC) with standard methods (HTTP, JSON over Websockets) and a baseline that should have performance close to the best achievable.
Some interesting future directions to take this include:
- Add more systems to the evaluation.
- Which ones?
- Track latency histogram instead of single number.
- Any other relevant metrics?
- Add additional call types
- Which ones?
If you have suggestions, please open an issue in the gorpcbench repository.
Notes
↩ While _technically_ you might not want to call every data exchange protocol an RPC system, for the purposes of this post I'll consider any communication protocol an RPC system.