KEEP UP WITH OPTIONSCITY

Java vs. C++ Performance Face-Off, Part II

At OptionsCity we take engineering very seriously. All aspects of our software undergo rigorous analysis for performance, security, stability, and flexibility. We are staffed by highly experienced software developers with decades of industry experience. We realize that extensive functionality does not come without costs, but we feel these costs are justified, and just as no sane person would buy a car without airbags to reduce weight in order to go faster, we won’t sacrifice stability or offer stripped down functionality in pursuit of performance. That being said, we believe our platform offers “real world” performance on par with any competitor and we welcome any valid test comparison. 


Since live apples to apples comparisons are difficult if not impossible to implement, due to the correlated nature of trading environments, competitors often resort to generic statements like, “Java is slow!”, or “Our system is in C++, so it’s fast!” Well, those statements are patently false. There is nothing in a modern Java implementation that’s slow, in fact, in many cases a Java based system will offer superior performance.


I previously wrote a blog post comparing C++ and Java performance using “FIX FAST” decoding. Since the Java code used was proprietary it had limited usefulness, and so I’ve decided to revisit the topic in a more scientifically reviewable manner.


To demonstrate that, I’ve developed a set of simple benchmarks that test some of the critical aspects of the software used in a trading system implementation:


  1. network performance (Network)
  2. calculation performance (BlackScholes)
  3. multi-threaded performance (WorkQueue)

All tests use identical algorithms between Java and C++, and in many cases the exact code. The BlackScholes and WorkQueue tests were found on the web. I wrote the Java version of WorkQueue to match the C++ version as closely as possible. The Network test is my own design, and I wrote both the C++ and Java versions.


The tests were run on identical hardware and OS. The  g++ compiler with O3 optimizations were enabled, and Java 8 used the default G1 collector options. In all cases, multiple runs were performed, and the best results recorded.

Here are the side-by-side results (rounded to 2 decimal places):

 

C++

Java

 

Total Time (secs)

USec Per Op

Total Time (secs)

USec Per Op

Network

3.30

6.61

3.60

7.22

BlackScholes

2.46

0.49

3.09

0.62

WorkQueue

3.05

0.61

1.19

0.24

BlackScholes Improved

0.38

0.08

0.56

0.11

BlackScholes Improved #2

0.19

0.04

0.19

0.04


Let’s analyze the results.


In the Network test, C++ is about 10% faster. The test involves 4 IO operations per op, ( 2 read and 2 writes), so the actual difference in performance per operation is about 250 nanoseconds. If you review the JDK source code, the source of difference appears to be the ‘security checks’ that Java performs in order to provide the ‘sandbox’ environment. Running the same Java code using a proprietary socket layer shows the performance difference to be less than 4%.


The BlackScholes test appears to be a big win for C++, as it 20% faster, but we’ll return to this in a bit...


The WorkQueue shows Java to be a clear winner - almost 3x faster than C++.


WorkQueue tests a major component of concurrent systems - passing data between cores/threads. With core speeds reaching theoretical limits, multiple cores and concurrent programs are essential for leading edge performance. Java’s runtime performance analysis in the JVM allows for highly efficient synchronization and memory management in a concurrent environment.


One thing to keep in mind, the code used for these tests was not optimized - it was what a typical developer designed and published on the web. Better versions of ‘WorkQueue’ are certainly possible - for instance by using spin locks, lock-free structures, and “disruptor” patterns. We use a similar data-structure in Metro, and this highly optimized proprietary version is more than twice as fast as the one here, but that is not the point of these tests - we are trying to compare raw Java and C++ as closely as possible, not the quality of third-party libraries, and so the code was kept as simple and straightforward as possible.


Lastly, let’s revisit the BlackScholes test.


Profiling the code it is apparent that most of the “work” is in the Math library pow() function. A quick optimization of the code (BlackScholes Improved) and now Java is more than 3x faster than the original C version. Taking a similar approach, the use of Math.exp() could be optimized, (BlackScholes Improved #2) and with only an error of 0.03 % improve the speed by another 2x, making Java more than 10x faster than the original C++ code. When the same optimizations are applied to the C++ code, the same error is obtained, and the Java and C++ versions are equal in performance.


The “BlackScholes Improved” and “BlackScholes Improved #2” demonstrate a critical point - that the quality of the code/algorithm is as much if not more important than the underlying development language. Since Java allows for rapid application development and refactoring, it is far easier to design new and better implementations without “breaking things”.


How can Java outperform C++? This doesn’t make sense to people not deeply familiar with the underlying architecture, or they’re biased against Java is based on its original implementation 20+ years ago.


Java is compiled - just like C++ - it just performs the compilation while the program is running. This does lead to slower performance at the start, but there are several technologies available used to minimize this (Zing ReadyNow). But, because Java is compiled during program execution it can make optimizations a static compiler cannot, including optimizing the instruction set for the executing processor. Also, much of the underpinnings of Java are already written in C, or are simple passthroughs to the native OS libraries.


Java uses garbage collection and so it can allocate memory/objects very efficiently - especially with multi-core concurrent systems - far more efficiently than non garbage collected systems (like C++). Historically, garbage collected systems incurred low-latency killing “pauses” while the garbage was being collected. Modern garbage collectors, like the patented Azul Systems Zing that OptionsCity uses, have eliminated these pauses.


In summary, it is clear that the raw performance of modern Java is equal to, if not superior to C++ systems, and that the quality of the code is the most important factor in the performance of the system.


Stay tuned, as an upcoming blog post will show why none of these numbers really matter...


References:


Linux System:

Scientific Linux 6.4

Kernel 3.10.58.rt62.60.el6rt.x86_64

GCC 4.9.1

Java 1.8.0_92

Dell PowerEdge R610 (Dual Xeon X5690 @ 3.47 GHz, 2 x 6 cores)


Black Scholes

http://www.espenhaug.com/black_scholes.html


WorkQueue

http://vichargrave.com/multithreaded-work-queue-in-c/


Additional results:


I also compiled and ran the tests on a desktop Linux machine, running Ubuntu 14, and a desktop OSX machine, running 10.11.4. Here are the summary results:


Ubuntu Dell Studio XPS 8100, Core i7 @ 2.93 GHz


 

C++

 

Java

 
 

Total Time (secs)

USec Per Op

Total Time (secs)

USec Per Op

Network

5.58

11.15

6.36

12.71

WorkQueue

2.25

0.45

1.00

0.20

BlackScholes

3.04

0.61

0.23

0.76


OSX, iMac 27”, Core i7 @ 3.4 GHz


 

C++

 

Java

 
 

Total Time (secs)

USec Per Op

Total Time (secs)

USec Per Op

Network

11.83

23.65

12.45

24.9

WorkQueue

40.40

8.08

2.05

0.41

BlackScholes

1.30

0.26

3.11

0.62



The OSX machine is an interesting case. It appears the posix thread support has very poor performance, but the native math libraries are excellent.


So clearly the hardware/OS/configuration will play a significant role in the performance of the system, even when running identical code, and so expert tuning of the system in response to actual workloads is required for optimum performance.


Source Code


Send a request with your name and email to sales@optionscity.com and reference the title of this blog entry.


Posted by: Robert Engels, Chief Freeway Architect

Robert Engels

Posted by Robert Engels

Previous article Marex Spectron Successfully Deploys OptionsCity Metro for Market-Making Commodity Desk Next article Thomson Reuters Adds Futures and Options Execution Application for Commodities Traders in Eikon