GCJ---The GNU Compiler for Java

by
Weiqi Gao, Principal Software Engineer
Object Computing, Inc. (OCI)

Introduction

GCJ, a radically traditional(*) Free Software implementation of the Java language, has been part of GCC since the 3.0 release in June 2001. Currently at version 3.2.1, it is supported on GNU/Linux on Pentium, Itanium, Alpha, PowerPC and AMD Hammer, FreeBSD on Pentium, Solaris on SPARC and more. Support for Windows using MinGW is also available.

It can compile Java source code to either Java bytecode (class files) or native machine code. It can also compile Java bytecode to native machine code. GCJ native code can be executables or shared libraries.

GCJ's Java implementation is not complete. Notable omissions include AWT and Swing. However most other Java features are supported, including collections, networking, reflection, serialization, JNI and RMI.

In this article, I will first take you on a tour of GCJ basics, and then show some more advanced features.

Getting GCJ

GNU/Linux

GCC is bundled with most Linux distributions. To install GCJ on a Red Hat Linux 8.0 system, simply run

Windows

Windows users need to visit the MinGW Download page and get the following two files:

and install both into a common directory (C:\MinGW will work fine.)

Other Platforms

Users of other platforms may get binary distributions of GCC from their favorite download centers.

A First Look at GCJ

For the rest of the article, I will use Red Hat Linux 8.0 as my platform. The differences between platforms are mostly limited to filename extensions.

GCJ has two parts: the compiler and the runtime support library. The compiler includes these commands:

gcj
the GCJ compiler
gcjh
generates header files from Java class files
jcf-dump
prints information about Java class files
jv-scan
prints information about Java source files

The runtime support includes:

libgcj.so.3
GCJ runtime suppport library
libgcj-3.2.jar
Java class files of core GCJ classes, automatically searched when compiling Java sources

and commands:

gij
an interpreter for Java bytecode
grepjar
a grep utility that works on jar files
jar
Java archive tool
jv-convert
convert file from one encoding to another
rmic
generate stubs for Remote Method Invocation
rmiregistry
remote object registry

Compiling and Running Java Programs

The gcj command is very similar to the gcc command:

[weiqi@gao]$ gcj -c Hello.java                 # compile to Hello.o
[weiqi@gao]$ gcj --main=Hello -o Hello Hello.o # link Hello.o to Hello
[weiqi@gao]$ ./Hello                           # run Hello
Hello, World!

Compiling and linking can be combined into one step:

[weiqi@gao]$ gcj --main=Hello -o Hello Hello.java # compile and link

The --main=Hello is needed to tell the linker which class's main() method is the entry point to the executable.

The -C switch tells gcj to compile to Java bytecode:

[weiqi@gao]$ gcj -C Hello.java    # compile to Hello.class
[weiqi@gao]$ gij Hello            # run Hello.class in the gij interpreter
Hello, World!
[weiqi@gao]$ java Hello           # run Hello.class in the Sun JIT
Hello, World!

The gcj compiler can compile Java class files, and even jar files to machine code:

[weiqi@gao]$ gcj -c Hello.class          # compile Hello.class to Hello.o
[weiqi@gao]$ jar cvf hi.jar Hello.class  # create a jar
[weiqi@gao]$ gcj -c hi.jar               # compile hi.jar to hi.o
[weiqi@gao]$ gcj --main=Hello -o hi hi.o # link hi.o to hi
[weiqi@gao]$ ./hi                        # run hi
Hello, World!

Shared Libraries

The -shared switch tells gcj to link into a shared library. Assume the following sources:

// A.java
public class A {
    public void foo() {
        System.out.println("A.foo()");
    }
}
// B.java
public class B {
    public static void main(String[] args) {
        new A().foo();
    }
}

we can compile A.java into a shared library libA.so, and compile B.java into an executable B that is linked against libA.so:

[weiqi@gao]$ gcj -shared -o libA.so A.java     # compile to shared library
[weiqi@gao]$ gcj --main=B -o B B.java -L. -lA  # link against libA.so

We have to put libA.so temporarily into the LD_LIBRARY_PATH before executing B:

[weiqi@gao]$ LD_LIBRARY_PATH=. ./B
A.foo()

Debugging with GDB

Support for GCJ has been added to the GNU Debugger GDB. The full power of GDB can be used with Java programs. To make debugging symbols available to GDB, we need the -g switch on the gcj command line:

[weiqi@gao]$ gcj -g --main=C -o C C.java    # compile with debug symbols
[weiqi@gao]$ gdb C                          # debug it
(gdb)

In gdb, run starts the executable, break sets break points, step, next, cont controls flow of execution, print, display prints values of variables.

The gdb in Red Hat Linux 8.0 had trouble understanding the command

(gdb) break Foo.main

to set a break point at the main method on the class Foo. A work around is to type

(gdb) break 'Foo::main

and use tab completion to get

(gdb) break 'Foo::main(JArray<java::lang::String*>*)'

How Do I ...?

GCJ is sufficiently different from other Java implementations that we need to pay attention to how things are done with it.

Searching For Classes

In GCJ, Java classes may appear in .so files in addition to jar files and loose class files. When a Java class is dynamically loaded, libgcj searches for it in .so files first. For example, the class a.b.C is searched in the following order:

  1. lib-a-b-C.so
  2. lib-a-b.so
  3. lib-a.so
  4. the CLASSPATH

Note that if a class is loaded from the CLASSPATH, it is interpreted.

Setting Properties on Invocation

Since there is no JIT to invoke when running a GCJ compiled executable, properties are passed to the program at invocation time in an environment variable GCJ_PROPERTIES:

// A.java
public class A {
    public static void main(String[] args) {
        System.out.println(System.getProperty("a.b"));
        System.out.println(System.getProperty("c.d"));
    }
}
[weiqi@gao]$ gcj --main=A -o A A.java
[weiqi@gao]$ GCJ_PROPERTIES="a.b=x c.d=y" ./A
x
y

Compiled Native Interface

In addition to standard JNI, GCJ provides an alternative Compiled Native Interface (CNI, originally Cygnus Native Interface) to C/C++ code that takes full advantage of the fact that GCJ implements Java as basically a subset of C++.

The CNI mapping provides C++ classes for primitive Java types, a set of functions to work with Java strings and arrays, and an invocation API for calling Java methods from C++ programs. Java reference types are mapped to similarly named C++ types using the gcjh command.

Invocation API

Here's a simple example of invoking Java from C++:

// invoke.cc
#include <gcj/cni.h>
#include <java/lang/System.h>
#include <java/io/PrintStream.h>

#include <iostream>
#include <string>

int main() {
    JvCreateJavaVM(NULL);
    JvAttachCurrentThread(NULL, NULL);
    std::string cppstr("Hello, C++ String");
    jstring jstr = JvNewStringUTF(cppstr.c_str());
    ::java::lang::System::out->println(jstr);
    JvDetachCurrentThread();
}

[weiqi@gao]$ gcj -o invoke invoke.cc -lstdc++ # compile and link against libstdc++
[weiqi@gao]$ ./invoke
Hello, C++ String

Native Method Implementation

Native method implementation is just as natural:

// a/A.java
package a;

public class A {
    public native String foo(String str);

    public static void main(String[] args) {
        String str = "Hello from Java";
        str1 = new A().foo(str);
        System.out.println("Java printing: " + str1);
    }
}

[weiqi@gao]$ gcj -d . -C a/A.java  # create a/A.class to run gcjh with
[weiqi@gao]$ gcjh a.A              # create header a/A.h for class a::A
[weiqi@gao]$ gcjh -stubs a.A       # create a stub implementation a/A.cc

Then we edit a/A.cc to add our implementation code:

// a/A.cc
#include <a/A.h>
#include <gcj/cni.h>
#include <iostream>

::java::lang::String * a::A::foo (::java::lang::String *str)
{
    // print the string sent in from Java
    int length = JvGetStringUTFLength(str);
    char buffer[length + 1];
    JvGetStringUTFRegion(str, 0, length, buffer);
    buffer[length] = '\0';
    std::cout << "C++ printing: " << buffer << std::endl;
    // return a new string
    std::string str1("Hello from C++");
    return JvNewStringUTF(str1.c_str());
}

[weiqi@gao]$ gcj -c -d . a/A.java                  # compile Java
[weiqi@gao]$ g++ -I. -c -o a/natAImpl.o a/A.cc     # compile C++
[weiqi@gao]$ gcj --main=a.A -o a-A a/*.o -lstdc++  # link
[weiqi@gao]$ ./a-A
C++ printing: Hello from Java
Java printing: Hello from C++

Uses of GCJ

Since GCJ is a free software implementation of Java, many free software Java packages have been ported to work with GCJ. The rhug project provides a collection of such ports. Notable entries include Ant, Log4j, JUnit, Rhino, Xalan-Java, Xerces2-J.

As I am writing this article, reports surfaced on the Internet that GCJ can now run Eclipse, the Java based IDE.

Summary

GCJ is a capable, wide spread, free implementation of the Java platform. It offers some unique characteristics that make it viable to use the Java programming language in situations where using the J2SE SDK is not feasible.

Obvious advantages of GCJ over the J2SE SDK include faster startup time, smaller memory footprint, shared memory between different native Java processes, and easier interface with other languages. Disadvantages include the lack of AWT and Swing support.

GCJ is one more choice for Java and free software developers. Give it a try and see if it can solve some of your problems.

References