Buglabs have done a comparison of open-source JVMs on their embedded ARM platform (the BUG, based on an ARM1136JF-S core). The tested VMs were PhoneME advanced, Cacao and JamVM. The results are very interesting :
JamVM comes out the fastest, followed by PhoneME and then Cacao. On startup time, JamVM also comes out top (3 ms), followed by Cacao (12 ms) and PhoneME (16 ms).
The caveat is that PhoneME's JIT is not being used because of kernel issues (and presumably, its startup time would increase even further). The real mystery, however, is the poor performance of Cacao. A good result for JamVM is meaningless if the test isn't fair.
The benchmarks used in the test are dominated by floating point. Looking at the Technical Reference Manual for the ARM core shows that it has a Vector Floating-Point (VFP) coprocessor. As long as the toolchain is correctly setup this should be supported by JamVM. The question is whether Cacao's JIT correctly produces floating-point instructions or always uses emulation.
Another possibility is cache behaviour. The performance improvement of JIT code being offset by increased I-cache misses (an interpreter should fit entirely within the cache). JamVM's inlining interpreter is disabled on ARM, the direct-threaded interpreter being used by default. This is because inlining/super-instructions showed no performance improvement despite 200-300% improvement being seen on AMD64 (at least on an ARM920T). Cache behaviour was my tentative conclusion but I didn't have time to investigate it further. I'm still hoping that the recent changes to the inlining interpreter will show gains on ARM.