While benchmarking HotSpot against OpenJ9 I realised that 5k requests per second are nice enough, but that there might still be some room for optimization.

Vert.x has two ways to improve request performance, the native transport and the amount of Verticles (i.e. concurrency) you allow for requests. So I started playing around with that.

Native Transport

So Vert.x has this ability to use different transport implementations, that basically replace the Netty Channel based event loop with something EPoll based.

In order to enable that you just have to add

<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-transport-native-epoll</artifactId>
    <version>4.1.19.Final</version>
    <classifier>linux-x86_64</classifier>
</dependency>

to your pom.xml. Make sure that the version matches the version of Netty Vert.x is using right now.

In order to prefer the native transport you have to set a Vert.x Option:

new VertxOptions().setPreferNativeTransport(true)

It also gives you some more options to play with the underlying TCP stack:

final HttpServerOptions options = new HttpServerOptions()
        .setTcpFastOpen(true)
        .setTcpNoDelay(true)
        .setTcpQuickAck(true);

So let’s see how that performs:

Summary:
Total: 10.0067 secs
Slowest: 0.0259 secs
Fastest: 0.0027 secs
Average: 0.0091 secs
Requests/sec: 5494.8290


Response time histogram:
0.003 [1] |
0.005 [43] |
0.007 [401] |
0.010 [49262] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.012 [3198] |■■■
0.014 [938] |■
0.017 [1033] |■
0.019 [74] |
0.021 [26] |
0.024 [7] |
0.026 [2] |


Latency distribution:
10% in 0.0081 secs
25% in 0.0085 secs
50% in 0.0089 secs
75% in 0.0093 secs
90% in 0.0097 secs
95% in 0.0107 secs
99% in 0.0151 secs

On average it gives you a plus of 300 requests per second. It is faster, but actually not much.

Verticle deployments

Vert.x has multiple event loops, in the default configuration as many as visible cores. So if you start your HttpServer in a Verticle, you can scale it on the same machine with the amount of instances that you deploy. Aligning that with the number of hardware threads that your system provide is in general a good option.

Vert.x is actually reusing the port binding, so it allows you to deploy multiple Verticles that bind to the same port, as long as they do the same thing that is generally not a problem.

vertx.deployVerticle(() -> new AbstractVerticle() {
            @Override
            public void start(Future<Void> startFuture) {
                vertx.createHttpServer(options)
                        .requestHandler(handler)
                        .listen(restConfiguration.getPort(), restConfiguration.getHost(), ar -> {
                            if (ar.succeeded()) {
                                startFuture.complete(null);
                            } else {
                                startFuture.fail(ar.cause());
                            }
                        });
            }
        }, new DeploymentOptions().setInstances(Runtime.getRuntime().availableProcessors()),
        completableHandler.handler());

I’m using availableProcessors(), which since Java 10 even takes visible processors through CGROUPs into account, meaning that inside your CPU limited Docker container you only get as many instances as you can utilize CPU.

Summary:
Total: 10.0053 secs
Slowest: 0.0313 secs
Fastest: 0.0006 secs
Average: 0.0070 secs
Requests/sec: 7096.2075


Response time histogram:
0.001 [1] |
0.004 [7472] |■■■■■■■■■■
0.007 [30415] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.010 [21795] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.013 [7789] |■■■■■■■■■■
0.016 [2382] |■■■
0.019 [701] |■
0.022 [281] |
0.025 [94] |
0.028 [49] |
0.031 [21] |


Latency distribution:
10% in 0.0036 secs
25% in 0.0049 secs
50% in 0.0065 secs
75% in 0.0086 secs
90% in 0.0110 secs
95% in 0.0129 secs
99% in 0.0176 secs

That gives another 2.5k requests per second and actually fully loads my poor Laptop, so that massively increases throughput up to 7k on HotSpot.

A quick look at OpenJ9 shows that it is consistently 1k/rps slower, so I guess most of the high performance stuff is just better optimized for HotSpot.

Advertisement