Contents
INTRODUCTION
The Fast Virtual Machine (Fast VM) is the hp® next-generation, Just-In-Time
(JIT) compiler designed to increase Java application performance.
By generating native code for methods as they are invoked, Fast VM yields application
performance at rates typically 4 to 5 times faster than the same application
run with the conventional Java Development Kit's (JDK) Classic JVM combined
with other JIT compilers.
This Fast VM release targets the Alpha microprocessor running
on hp Tru64 UNIX® systems.
This document highlights the Fast VM's benefits, describes its high performance
capabilities, and presents a detailed view into the Fast VM architecture and
technology.
Benefits of the Fast VM
Tru64 UNIX customers have come to rely on hp's timely SDK releases containing
the Classic JVM and JIT compiler. Now, with Fast VM, Tru64 UNIX users will enjoy
enhanced product features including:
- High Performance. Efficient object format and allocation,
runtime optimizations, and a Java execution environment highly tuned for the
Alpha platform take the runtime performance of Java applications to the next
level, virtually eliminating performance as a roadblock to deploying Java
applications.
- Java Compatibility. The Fast VM implements the full JDK
and passes all the Java Compatibility Kit (JCK) tests. This is in contrast
to research projects that have demonstrated excellent Java performance but
implement only a subset of the JDK.
- JDK Class File and Shared Library Support. Rather than
using its own modified versions, Fast VM takes advantage of the thoroughly
tested JDK class files and shared libraries.
- Ease of Use. Users are presented with a single integrated
java command. Fast VM is invoked by typing, "java -fast".
HIGH PERFORMANCE CAPABILITIES
Fast VM performance enhancements include direct execution of Java methods,
efficient object format and allocation, a Java execution environment highly
tuned for the Alpha architecture, and runtime optimizations. These techniques
were designed to make compilation time negligible to the Java user.
- Direct Execution
When the Classic JVM executes a Java program, it reads and then interprets
the bytecodes. The result is a JVM easily ported to numerous architectures
but with slow execution when compared to conventional programming languages.
A technique used to address the poor performance of Java bytecode interpretation
is the integration of a JIT compiler with the interpreter. Instead of interpreting
the bytecodes, the JVM passes them to the JIT compiler, which translates
them into native code for the platform on which it is running. Although
JIT compilers significantly increase performance, they are constrained by
the interpreter. For example, method invocation returns control back to
the interpreter to perform stack management rather than calling the method
directly.
Fast VM takes the approach of a conventional compiler and translates Java
bytecodes directly into native machine code. Thus, every Java method is
compiled. Java code executes as if it were written in a conventional programming
language: there is a single stack per thread and calls are direct and conform
to the Alpha calling standard. To the operating system, a Java method appears
just like a procedure written in a conventional programming language.
- Efficient Object Format
The Fast VM eliminates the performance bottleneck resulting from the Classic
JVM's representation of objects via handles.
Over the past decade, modern reduced instruction set computer (RISC) systems
have become prevalent. The speeds of these processors have increased at
much faster rates than corresponding memory systems. For many applications,
memory references rather than execution speeds become the performance bottleneck.
The Classic JVM represents an object as a pointer to a data structure called
a handle, which contains a pointer to the instance data.
This object layout results in an unnecessary memory reference for every
object access. Although handles have a number of desirable qualities (especially
related to garbage collection and portability), the additional memory reference
may result in significantly degraded performance on modern RISC processors.
A complication faced by Fast VM is that some native methods in the JDK
assume that an object reference points to a handle. The Fast VM provides
an innovative solution to this problem by allowing this infrequent case
of native methods accessing an object instance through a handle to work
while Fast VM accesses that same object instance with only a single level
of indirection. This is accomplished by allocating the handle and instance
data adjacent to each other. Instance data is accessed by adding an offset
to an object's address or by double indirection through the handle. Following
is an illustration of the Fast VM's object format:
The Fast VM Object Format
Word
|
Purpose
|
| 0 |
[Handle] Pointer to Instance Data.
Contains the address of the 4th word of this structure.
|
| 1 |
[Handle] Pointer to Sun Metadata |
| 2 |
Pointer to class object and garbage
collector bits |
| 3 |
Monitor and Array Length Information
|
| 4 |
(actual data) [Instance Data]
|
- Fast Object Allocation
Tru64 UNIX provides an efficient implementation of native threads and quick
access to thread local storage, which allows Fast VM to perform fast object
allocation. Each thread is given its own memory area from which to allocate
objects. In the normal case, object allocation is accomplished by incrementing
a pointer and requires no synchronization with other threads.
- Fast Monitors
An attraction of the Java programming language is that it makes it easy
for programmers to write multi-threaded applications. In order to ensure
the consistency of a set of related data structures, synchronization primitives
are available to the programmer. These primitives are also used extensively
by the JDK libraries so that these libraries can be safely invoked by multi-threaded
applications.
In the common case that only one thread tries to lock a given object, synchronization
is accomplished without operating system intervention. The thread obtains
a spin lock located in the object header, updates the header, and releases
the spin lock. This results in monitor synchronization that is not a performance
bottleneck for most real world applications.
- Optimization of Runtime Checks
One of the appeals the Java programming language holds for programmers
is that it is strongly typed and provides automatic array bounds checking.
Fast VM performs extensive analysis to minimize any performance penalty
resulting from these runtime checks. For example, many array bound checks
are redundant and can be eliminated. If an array bounds check is required,
Fast VM performs a highly optimized code sequence that checks the lower
and upper bounds with a single comparison instruction.
Fast VM emits no additional instructions to detect a NULL pointer exception.
Instead, optimized code is emitted and if the infrequent incident of de-referencing
of a NULL pointer occurs, a signal is raised by the operating system, caught
by Fast VM, and translated into a NullPointerException exception. Thus,
only programs that actually de-reference NULL pointers run slower due to
this safety feature.
- Optimized Method Calls
The Fast VM monitors program execution and optimizes method calls based
on the changing environment. A key benefit of this approach is that users
avoid performance penalties due to features they are not using.
For example, if a method is not overridden, the method is called directly.
However, when the method is overridden, the direct call is replaced by a
call using a virtual function table (this action involves an extra memory
reference).
- Runtime Machine Specific Optimizations
An advantage that virtual machines have over conventional compilers involves
their knowledge of the runtime execution environment. For example, only
later versions of the Alpha processors have byte manipulation instructions.
Fast VM recognizes the type of Alpha processor it is executing on and emits
processor specific code patterns.
ARCHITECTURE
Overview
The Fast VM is written in a portable subset of C++ using high level object-oriented
abstractions and consists of reusable components. The following illustration
provides an overview of the Fast VM architecture:
Command Processor
Fast VM Executable |
Runtime Interface
RTL Entry Points - called
directly from generated code |
Java Native Interface
(JNI) |
Exception Handlers |
Native methods overriding
JDK provided versions |
Glue
interpreter routines called by native methods |
Reusable Components
Object Factory |
Garbage Collector |
Compiler and Symbol Table
|
System Services |
Architecture Command Processor
The Fast VM begins execution when the user invokes the java
command with the appropriate switch or environment variable set. The Command
Processor performs the following actions:
- Parses and interprets the specified switches and environment variables.
One of the command options is the name of the class containing the "main"
method to be executed.
- Loads the specified class resulting in the following:
- Allocation of the class's static variables.
- Production of stub code for each method of the class. Invoking the stub
code results in the method being compiled and executed. Additionally,
future invocations of the method go directly to the compiled code.
- Compilation and execution of the class's static initializer.
- Invokes the class's "main" method.
- Returns control to the user after the main method completes.
Architecture Runtime Subsystem
The most common method for returning control to Fast VM is through the RTL
Entry points. These consist of approximately 25 entry points that the
compiled code invokes directly. The entry points include:
- Mathematical routines for integer divide, integer remainder, floating point
divide, floating pointer remainder, or conversion.
- Object creation routines.
- Monitoring routines (enter and exit).
- Checking routines for array store and cast operations.
- Exception handling routine for throwing or catching exceptions.
- Compilation routines for compiling methods into native machine code.
Another way control is returned to the Fast VM occurs when the user invokes
a C or C++ routine that uses the Java Native Interface (JNI).
JNI provides the Java programmer a JVM-independent mechanism for writing native
methods. The function prototypes are provided in a file called "jni.h"
and the JVM provides the implementation of these functions. JNI routines perform
functions such as reading and writing a field, invoking a method, etc.
Rather than add expensive checks to generated code, Fast VM establishes exception
handlers to catch certain conditions. For example, instead of prefixing every
pointer dereference with a check for a NULL pointer, Fast VM emits code without
checks and establishes a signal handler which throws a NullPointerException
after the program dereferences address zero.
Both the Classic and Fast VMs use most JDK native methods. However, certain
native methods depend upon the Classic JVM's object layout. In these cases,
Fast VM provides an alternative implementation of the native method. The alternative
native methods are collected in the Native Methods subsystem.
One example is the java_lang_Object_hashCode which, when given
an object, returns its hashcode. The Classic JVM's implementation depends on
references that are pointers to immovable handles. In contrast, the Fast VM
eliminates handles and references are pointers to objects freely moved by the
garbage collector.
Some of the native methods in the JDK reference static variables or call routines
exported by the Classic interpreter. In order to use these native methods, Fast
VM provides an implementation of the required interpreter entry points and collects
them together in the Glue subsystem. An example is a routine
called SignalError that is frequently called by native methods
implementing the Abstract Windowing Toolkit (AWT).
Architecture Reusable components
Object Factory
The Object Factoryis the heart of the JVM. It contains C++
classes responsible for creating and manipulating objects. For example, the
class JavaObject exports methods such as Create, MonitorEnter, and MonitorExit
that operate on instances of java.lang.Object. The architecture is such that
changing the format of an object involves modifying this one class. This class
is "extended" to provided specific subclasses such as JavaClassObject
(instances of java.lang.Class) and JavaArrayObject (array objects).
Operating System Services
The Operating System Services module contains routines providing
portable system services. Examples of these services include thread management,
exception processing, and file system operations.
Compiler and Symbol Table
The Compiler and Symbol Table module is responsible for loading
and verifying classes, compiling methods and providing access to the symbol
table.
An integral part of the Fast VM is the Garbage Collector,
described in the next section.
Garbage Collection
Two common garbage collection algorithms are termed mark-and-sweep and copying.
The Fast VM garbage collector combines techniques of the conservative mark-and-sweep
and the accurate copying collectors.
Mark-and-Sweep
A mark-and-sweep collection typically consists of
two phases. In the first phase, each object known to be reachable is visited
and marked as live and then scanned for references
to other objects, which in turn are visited. During the second phase, memory
is linearly traversed, or swept, and unmarked objects
are added to a free list. An optional third phase involves the compaction of
marked objects.
Copying
A copying collector divides memory into two areas,
referred to as fromspace and tospace.
Objects are allocated in fromspace. When this
area runs out, live objects are copied into tospace.
The tospace area is then re-designated as fromspace
and the area formerly occupied by fromspace is
re-designated as tospace. Empirical studies show
that most Java objects are short lived and consequently a large percentage of
fromspace is not copied.
One key piece of data required by a copying collector
is precise information that determines whether a given memory location contains
a reference to an object. If the referenced object is copied, the memory location
must be updated with the new address. Because the Fast VM supports the JDK,
it must also support the old Native Method Interface (NMI). Therefore, when
a program is executing a native method that is accessed via NMI instead of the
newer JNI interface, the collector can not distinguish between an integer whose
value is coincidentally an address within the heap and an actual reference to
an object within that same heap. The collector has precise information regarding
references within Java frames, within objects, and within JNI native methods.
Mostly Copying
Mostly-Copying is effectively a hybrid conservative
and copying collector. It copies objects known to be alive and only pointed
to by precise references. Objects, that are referenced via imprecise pointers,
are not physically moved, but rather added to tospace using a sophisticated
bookkeeping algorithm. It is worth noting that at a given collection point only
a small percentage of the objects are referenced by imprecise pointers and typically
at the next collection point a different set objects is identified as being
imprecise. Another way of looking at this is that at a given collection point
only a small number of threads are likely to be in a non-JNI native method and
at the next collection point that native method has probably completed.
SUMMARY
The Fast VM provides users of hp Tru64 UNIX with one of the fastest Virtual
Machines available today. This paper describes the architectural overview of
a modern JVM and emphasizes how hp customers benefit from Fast VM and its high
performance capabilities.
Trademarks
HP and the names of hp products referenced herein are either trademarks and/or
service marks or registered trademarks and/or service marks of hp.
UNIX is a registered trademark in the United States and other countries, licensed
exclusively through The Open Company.
|