- Benchmarking Ruby, Python, JavaScript, Lua, Java, C++ and Assembly language
- Saved searches
- Use saved searches to filter your results more quickly
- DNS/benchmark-language
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.TXT
- About
Benchmarking Ruby, Python, JavaScript, Lua, Java, C++ and Assembly language
I admit it, I am a performance fetishist, maybe that’s the reason why I always looked down on the web a bit, and why 3D realtime visual simulation is still my favourite field of development activity, but I admit, cpu cycles are cheap and still getting cheaper, and human brainpower is expensive, and will be more expensive in the future, even with millions of indian and chinese programmers, because they will soon demand to have a life with good food, good doctors, a car, holidays and air conditioning. The cpus will just be happy with air conditioning. So substituting brain power with cpu power is very desireably, at least for those who pay the bills.
This is where rapid development, middleware, frameworks and all kind of web technology enter the scene, and digging deeper, behind all these systems there are programming languages of all sorts that promise exactly this: Through late binding and dynamic typing you can glue things together, and build even a decent amount of business logic with these languages much faster than in C++. Java is lurking somewhere inbetween, not beeing a dynamic “scripting” language, it has a mighty isolation layer in form of the VM and provides introspection and dynamic linkage.
While I do not have yet a simple way to quantify how much faster you can program or solve problems with a particular language, there is a quite easy way to determine the costs in terms of cpu cycles.
To make things even easier, I am not interested in any real world scenarios because there are way too much of them, and too many ways to program them, so my idea was to get some idea about the absolute upper limits about what can be achieved with a particular programming language. For this purpose, I devised three simple tests:
- LOOP: A counting loop that yields the upper limit of simple instructions I can execeute:
n = 1000000;
i = 0;
while (i < n) n+=1;
> - CALL: A counting loop with a simple function call with two arguments, which yields the upper limit of function calls I can make:
n = 1000000;
i = 0;
while (i < n) i =add(i,1);
> - MAT4 A loop that calls a function that carries out a full 4×4 matrix multiplication in this language:
n = 100000;
i = 0;
a = [. ];
b = [. ];
while (i < n) c =multiplyMatrix(a, b);
>
I also tried to optimize the code for the particular language and used the best performing version, e.g. in ruby it is about 20% faster to use the n.times statement than the while statement, so i used it; however, I am quite sure I did not always find the fastest way of doing things, but I used what I was able to come up with in decent time. So here the results; the numbers indicate the amout of loop cycles or functions call per second were performed on a Dell M70 with a Pentium M 2.13 Ghz Processor. The OS is WindowsXP SP2, Ruby 1.8.1-12, Python 2.4.1, Lua 5.0.2, Spidermonkey 1998 Vintage, Java SDK 1.4.2.08, C++ VS.NET2003. Spike-A and B are simple experimental expression tree based virtual machines written in C++ by me, trying to find the performance limits for an interpreted dynamic language. I ran every benchmark several times for some seconds, and took the best result, and rounded it after the second most significant digit.
Ruby | JavaScript | Python | Lua | Spike-A | Spike-B | Java -Xint. | Java JITC | C++DBG | C++OPT | x86-Asm | |
---|---|---|---|---|---|---|---|---|---|---|---|
LOOP | 2.7Mio. | 3.1 Mio. | 4.5 Mio. | 8.7 Mio. | 30 Mio. | 112 Mio. | 87 Mio. | 260 Mio. | 260 Mio. | 1000 Mio. | 2000 Mio. |
CALL | 1.4Mio. | 1.8 Mio. | 2.1 Mio. | 4.3 Mio. | 6.5 Mio. | 19 Mio. | 22 Mio. | 52o Mio. | 10 Mio. | 340 Mio. | n.n. |
MAT4 | 12000 | 20000 | 52000 | 23000 | n.a. | 2.5 Mio. | 160000 | 2.3 Mio | 4.2 Mio | 14 Mio. | n.n. |
Here some remarks about the languages and the results:
- Ruby: I was told it is slow, and indeed it came out to be slowest in every benchmark, but it is not an order of magnitude compared to other popular scripting languanges. Making the benchmark was quite straightforward, everything worked as expected.
- Javascript: I am familiar with it, using it for a while now. The Spidermonkey engine is an very old, matured piece of interpreter and very reliable, but not very beautiful. The performance is not exceptional and lies inbetween Ruby and Python.
- Python: Strangely, Python had the most tiny syntactic pitfalls for me as a non-python user, but this is probably a matter of habituation. On the other hand, the documentation for Python is not very good; it took some time to find out what functions to use for timing. Interestingly, the performance sticks out when it comes to array access and numerical expression evalution, even beating Lua, but still two orders of magnitude from C++.
- Lua: I was told it should be fast, and indeed it is faster than Python in loops, but not on the matrix test. Lua also does not have a Timer in the standard library with a better than 1 sec. resolution, so I had to run long benchmarks (>20 sec.) here to get some more precision.
- Spike-A:When I was unsatisfied with Spidermonkey performance last year, I tried myself a shot at building an interpreter in C++ for a dynamic, Javascript-like language. To make a long story short, i used all tricks from the books (and a few new), and after a lot of benchmarking, optimizing and selecting the fastest techniques, I reached a point with not much headroom left for further optimization. I found that interpreting a P-Code is slower than executing an expression tree, and I think that this is inevitable with current processor architectures. At the heart of every P-Code interpreter is a huge “switch” block that processes a token stream, dispatching dependent on some token values. An expression tree traverser just needs to perform function calls on pointers that were created when building the tree, so no lookup is required. The Spike-A Benchmarks were executed on an expression tree that was generated at runtime by a C++ programm, and as you see, it is 3-10 times faster than anything else out there. I am quite sure I could run Ruby, Python or JavaScript on this expression tree with the same performance as in the Spike-A benchmark, but I am still not sure it is worth the effort, if you compare the performance to Java JIT compilation.
- Spike-B: This is the same expression tree engine, but with a less dynamic approach; it benefits from a closer coupling to C++ and a large number of library functions and “precompiled” idioms; it would have to be a specially tailored language with a more static type system and and a fat syntax, but the performance figures represent what I think is close to the theorethical limit of an interpreter written in C++ without using Just-In-Time compilation or Assembler-Level optimization. To make it clearer: The MAT4 test here is so fast because Matrix and Vector Types are native types of the interpreter. It is possible to makes bindings to all kinds of native types for all the other scripting languages, but e.g. a Python-Wrapper for a C++ 4×4 Matrix multiplier also just yields about a few hundred thousand multplies per second; with SWIG, it will be possibly even slower than the native multiply.
- Java -Xint: The Java P-Code Interpreter runs 5-10 times faster than Lua; actually it is surprisingly fast; I think a lot of engineering effort went into it, and it probably also marks the limit what can be achieved with an interpreter. It is funny that the numbers are close to my Spike-B, which seems to confirm that you cant get much faster with interpreting.
- Java: The Java Just-In-Time compiler does a decent job and runs generally as fast as unoptimized C++, sometimes even faster. However, you still pay a price for the isolation layer.
- C++ DBG: Unoptimized Debug compiled C++ can be slower than you think, but the bad CALL test results seem to be anomaly, probably caused by some excessive checks generated by the VC compiler; I dont think they will be present with gcc, but I need to check that.
- C++ OPT: The C++ optimizer required me to modify the benchmark because 1) the optimzer throws away code if the result of a computation is not used and 2) the optimizer is able to replace a of number of additions in a loop by one multiplication, so I had to add an j^=i (xor) statement to both the loop body in LOOP as well as to the function body in CALL; otherwise the execution time was independent from the loop count n. After making sure the loops were not optimized away, C++ turns out to be 4-6 times faster than the Java JITC, except for CALL where the Java Compiler is faster.
- x86asm:I learned some interesting lessons when I tried to find out how fast the processor can be, so I took a shot at trying how fast I can loop in x86 assembly language. The interesting thing was that the naive approach with a loop of just three instructions (dec, test, jne) is not as fast as optimized C++; to get a two billion per sec. loop I had to unroll the loop keep the pipeline filled and use proper type of jump to play nice with the branch prediction. The interesting thing here is that I can really execute an average of almost three instructions per clock cycle in a simple loop, actually allowing a loop that runs with clock speed.
Ok, thats what I have gathered so far. It is not exactly scientifc, but quite interesting; I would not have guessed this outcome, and I would still like to fill some gaps in the table. I would also like to try C#, but all this took me already several days, and at one point I have to come to an end, it is already unusually long for just a blog entry. If someone is interested, I will also make a tar archive of all the benchmark programs available. There is a great number of details I learned, and all this testing has fueled my interest in rolling my own scripting language, but now I will go on holidays first.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
BENCHMARK: Lua vs vs LuaJIT vs C (MSVC, GCC, LLVM) vs Java vs Perl vs Javascript vs Python vs C# (.NET CLR, Mono) vs Ruby vs R
DNS/benchmark-language
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.TXT
How to run? >run-benchmark.bat BENCHMARK (FASTEST TIME) BENCHMARK: Lua vs vs LuaJIT vs C vs Java vs Perl vs Javascript vs Python vs C# vs Ruby vs R vs PowerShell Result (benchmark tested on AMD FX-8300, 3.3 GHz): Lua 5.4.2 7051.5711976423 command took 0:0:3.69 (3.69s total) LUAC 5.4.2 7051.5711976423 command took 0:0:3.27 (3.27s total) LuaJIT 2.0.5 7051.5711976423 command took 0:0:0.79 (0.79s total) C (MSVC 18, VS 2013) 7051.571198 command took 0:0:0.78 (0.78s total) C (GCC 7.2.0) 7051.571198 command took 0:0:0.78 (0.78s total) C (CLANG LLVM 6.0.0) 7051.571198 command took 0:0:0.78 (0.78s total) C (CYGWIN GCC 10.2.0) 7051.571198 command took 0:0:0.81 (0.81s total) C (CYGWIN CLANG 8.0.1) 7051.571198 command took 0:0:0.81 (0.81s total) C (MINGW GCC 10.2.0) 7051.571198 command took 0:0:0.79 (0.79s total) C (MINGW CLANG 8.0.1) 7051.571198 command took 0:0:0.78 (0.78s total) C (Embarcadero C++ 6.60 for Win32) 7051.571198 command took 0:0:1.64 (1.64s total) Java JRE Microsoft (build 17.0.6+10-LTS) 7051.571197642306 command took 0:0:0.90 (0.90s total) Perl 5.32.1 7051.57119764231 command took 0:0:21.21 (21.21s total) Javascript (Node.js 18.16.0) 7051.571197642306 command took 0:0:4.08 (4.08s total) Javascript (MS JScript 5.812) 7051.57119764231 command took 0:0:57.95 (57.95s total) Python 3.11.3 7051.571197642306 command took 0:0:33.87 (33.87s total) Numba (Python 3.11.3) 7051.571197642306 command took 0:0:1.99 (1.99s total) C# .NET Framework 4 (CSC 12) 7051.57119764231 command took 0:0:0.81 (0.81s total) C# Mono 6.12.0 7051.57119764231 command took 0:0:1.28 (1.28s total) C# Mono 6.12.0 (Interpreter) 7051.57119764231 command took 0:0:6.53 (6.53s total) Ruby 3.1.2-1 7051.571197642306 command took 0:0:18.65 (18.65s total) R 4.3.1 [1] 7051.571 command took 0:0:11.19 (11.19s total) PowerShell 5.1.19041.2673 7051.57119764231 command took 0:0:56.65 (56.65s total) PowerShell 7.3.4 7051.571197642306 command took 0:3:36.09 (216.09s total) PHP 8.2.7 7051.5711976423 command took 0:0:9.17 (9.17s total)
About
BENCHMARK: Lua vs vs LuaJIT vs C (MSVC, GCC, LLVM) vs Java vs Perl vs Javascript vs Python vs C# (.NET CLR, Mono) vs Ruby vs R