Java Bytecode Compilation - a Special Case of Binary Translation

Markus Pilz

Department of Computer Science
University of Zurich, Switzerland
pilz@ifi.unizh.ch

Thesis Abstract

Java [1] is a very successful object-oriented programming language from Sun that includes modern features like exception handling, garbage collection and thread support. Java programs are compiled to architecture-neutral bytecodes for a virtual machine, the Java virtual machine [2] . This makes Java programs highly portable, because they can be executed on any hardware/software platform as soon as a virtual machine is available.

The Java virtual machine can be implemented in software as an instruction set simulator. A straightforward implementation is a program that fetches one instruction at the time, decodes it, and executes it by calling a corresponding method based on the opcode. Such a simple bytecode interpreter is powerful, easy to write and simple to port to a new platform. Its main drawback is that it tends to be slow, because ten to hundreds of host instructions may be necessary to simulate only one virtual machine instruction. Furthermore, because the bytecode is architecture neutral, it cannot be optimized for the target machine.

Instruction set simulation can be speeded up by binary translation [3] , a technique in which the source machine code is translated into host machine instructions, so that the program can run native. Binary translation is a form of compilation where the source language is a machine and not a high level language. In the process of translating the source program to a semantically equivalent one written in a different language, a binary translator uses all the techniques already known from compiler construction, and its internal structure is that of a traditional compiler: it divides into a front-end, a middle-end and a back-end, which perform tasks very similar to the corresponding modules in high level language compilers. The front-end builds an internal representation of the program to translate, whereas the middle-end applies various optimizing program transformations, and the back-end generates target code. The front-end largely depends on the source language, and the back-end on the target machine, whereas the middle-end is hardly influenced by language and machine. However, there is a strong dependency between the three modules because of the internal representation used in all three.

In a binary translator, the most different module is the front-end. Instead of scanning and parsing a high level language program, an object file or an executable must be disassembled . Code must be separated from data, possible instructions must be identified, and the control and data flow must be determined. Binary translation is rather an old technique and various examples of its application are described in the literature. However, when building a binary translator for a machine with a von Neumann architecture, the disassembly phase is at the same time difficult and critical.

Disassembly poses many problems: some code cannot be disassembled because it is available only at runtime (e.g. self modifying code). Some code cannot be disassembled because it is ambiguous (e.g. has an interpretation as data, address and instruction). Some code cannot be detected until runtime (e.g. because of indirect jumps), and the additional code may invalidate some of the translated code. These problems force binary translators to handle some code at runtime, e.g. by falling back on interpretation or by dynamically retranslating it. Because of that, binary translators usually devote much effort to the disassembly phase in order to detect as much code as possible, whereas aggressive optimization and code generation is neither possible nor useful.

This is very different for the Java virtual machine. The instruction set and the class file is defined in a way that allows the division of code from data and the identification of all instructions prior to runtime. Furthermore, all the control flow can be determined without executing the program, thus exhaustive dataflow analyses is possible at translation time. We call this the disassembly property of the Java bytecode. On the one hand, the disassembly property is the prerequisite necessary for checking the bytecode for security violations without executing it. On the other hand, the disassembly property implies that all code can be optimized and translated to target machine code statically, and no code must be handled at runtime. This avoids many of the limits that existed so far for binary translation; it simplifies the implementation of the binary translator and it makes many aggressive and expensive optimizations feasible that where previously not possible or too costly.

Where the disassembly property enables aggressive optimization and code generation, the architecture of the Java virtual machine literally calls for it. The stack architecture and the inhomogeneous instruction set sharply contrast modern, high performance RISC architectures; and translating Java bytecode to efficient code for a superscalar RISC machine is not straightforward. Such machines have orthogonal instruction sets, large register files, multiple functional units with deep pipelines, and a complex memory hierarchy. The performance of such machines critically depends on register allocation and instruction scheduling, whereas instruction selection is of much lower importance than it was on CISC machines. It largely depends on the compiler whether the power of modern machines can be exploited or not. For Java, this means that efficient code can only be generated by relying on extensive program optimization and careful code generation.

Various just-in-time binary translators for Java exist. Because they generate code at run­time, complex optimization and code generation algorithms are too time consuming. Just-in-time translation is a sensible alternative, if interpretation is too slow, and the bytecode is not available until runtime, e.g. because it is loaded from the Internet. However, we think that in the future, Java will lose importance as implementation language for executable Web-content, but will become more and more important as a general purpose programming language that allows for remote program installation. When installing very large applications over the Net, or when down-loading a new program version to some embedded network device, immediate execution is less important than code quality and code size. In such a scenario, it may well be worth doing careful optimization and code generation. This leads to a mixed approach , where a simple interpreter handles the code that is only available at runtime, while a more traditional, static binary translator generates highly efficient native code off-line. The main advantage of static compilation is the fact that compilation time does not add to execution time, and more powerful algorithms are feasible. In practice, all the well studied optimization algorithms developed in the past for traditional compilers can be used. Furthermore, as most classes are available at optimization time, not only global but also interprocedural optimizations are possible.

The motivation for this thesis is to exploit the extensive use of binary translation for compiling Java bytecodes into native code. An introduction to binary translation is given, and the problems encountered with real machines are outlined. The thesis then elaborates on the disassembly property and proofs it for the Java virtual machine. The design and the prototype implementation of a binary translator framework called gaggia is described. gaggia translates Java bytecodes into an internal representation that well supports state of the art optimization and code-generation as known from current high level language compilers. As intermediate representation it uses a version of static single assignment form (SSA-Form) [4] that much closer matches a modern, pipelined, superscalar RISC architecture then the stack architecture of the Java virtual machine. The thesis describes how to translate Java bytecodes to this internal representation and the problems encountered. gaggia implements a framework, to which optimizing program translations can be added easily and their effects studied. Based on the results of building the prototype, the implication for the design of instruction sets for virtual machines and the implications for the design and the implementation of binary translators are discussed.

References

  1. Ken Arnold, James Gosling. The Java Programming Language . Addison-Wesley, 1996
  2. Tim Lindholm, Frank Yellin. The Java Virtual Machine Specification . Addison-Wesley, 1997
  3. Richard L. Sites et all. Binary Translation . CACM, 36(2), 1993, 69-81
  4. Ron Cytron et al. An Efficient Method of Computing Static Single Assignment Form . POPL `89, 1989, 25-35