No subject

Tue Aug 24 09:42:39 EDT 2004

Example 2(a) is a C# program that calculates, increments, and displays a number on the console. As of this writing, the Mono compiler executable is named "compiler" and compiles Example 2(a) without trouble, as Example 2(b) illustrates.

Mono includes a disassembler called "monodis" that dumps the contents of that executable in CIL. Example 2(c) is CIL code for the Main() method. Lines IL_0000 and IL0002 push the two integer values 23 and 67 onto the stack. The next line adds those values and leaves the result on the stack (at this writing, the compiler does not support constant folding). Line IL_0005 stores that result into the stack local variable number 0, which corresponds to the variable a.

Line IL_0006 pushes that variable's value back onto the stack, and line IL_0007 pushes the value 1 onto the stack as a 4-byte integer (i4). Next, line IL_0008 adds the values, and line IL_0009 duplicates the result. Finally, the first copy is stored back into the variable, and the second copy is consumed by the call to WriteLine(), leaving the stack empty. Example 2(d) is the output if you run the compiled application through the Mono run time (mint).

The Mono C# Compiler
The Mono C# compiler is written in C#. Eventually, Mono will expose the compiler as a component that can be reused by tools such as SharpDevelop (an open-source C# IDE) or the Mono implementation of the System.CodeDom.Compiler classes.

Writing the compiler in C# has a number of advantages. For example, the lexical analyzer can use C# objects to represent the entities that it is parsing. This makes it easier to deal with literals and perform constant folding, since you can use existing C# facilities to implement those. Writing the compiler in C# requires us to have a set of class libraries sufficiently complete to host the compiler on Linux.

The C# parser uses Jay, a port of Berkeley Yacc to Java, that we ported to C#. We considered using a more advanced parser generator, but decided the returns on such an investment would be minimal. C# itself is a simple language, and most of the interesting work takes place during the semantic analysis phase (after parsing). 

The compiler driver orchestrates the compilation process. The parser and the lexical analyzer create an internal representation of the input files using one class for each construct. For example, the if statement is represented by an If class that derives from the Statement class (all statements derive from this class). As with statements, expressions derive from the Expression abstract class. This organization is similar to that of the Guavac Java compiler.

Instead of implementing a complete type system that could cope with all the various features of the C# object model, we used types from the System.Reflection namespace as our type repository and System.Reflection.Emit to create types on the fly. 

The types in System.Reflection inspect and manipulate types at run time (for example, you can enumerate all the public methods exposed by System.String). System.Reflection.Emit generates in-memory or on-disk types based on System.Reflection representations. These two namespaces provide the building blocks for types. So, the Mono C# compiler creates a type, adds members (properties, events, methods, and fields), and uses System.Reflection.Emit to write the types out to an assembly, which is an EXE or DLL that has a Portable Executable (PE) header and contains CIL.

The Class Libraries
At this writing, the class libraries are a work in progress. However, we have some pieces implemented that let simple applications be executed under the Mono CLI run time.

The class library is a good place to contribute to Mono, as the work is very compartmentalized. The interfaces are well defined and the communication required between the various groups is small, so different programmers can work on different areas without interfering with each other.

We are using the NUnit framework (http://nunit.sourceforge.net/) to create test cases that exercise the class library. This is also an area where contributions can be made without a lot of communication or a deep understanding of the ever-evolving Mono. Since the Mono class library will be compatible with .NET, you could even develop the unit tests against Microsoft's .NET SDK.

We recently migrated to NAnt (http://nant.sourceforge.net/) as the build system for the class libraries. Other parts of Mono still use a make-based process to compile. At this time, we are working towards completing enough pieces of the class library to have a self-hosting tool chain that can be used to further develop Mono in Linux.

Mono's VES
Mono has two virtual execution systems — the Mono Interpreter (mint) and a JIT compiler — that share a metadata library that accesses and manipulates PE/COFF images containing CIL instructions.

Mint was originally developed as a proof of concept for Mono. It was designed to be easy to debug, easy to study, and comprehensive enough that it could be used as a reference for debugging problems with the JIT engine. Mint is more portable than a JIT, so a nice side effect is that you can run Mono on different architectures without a lot of work. Ideally, we will port the JIT to each supported platform, but the interpreter will be useful for bootstrapping, getting Mono running quickly, and running under systems where speed is not as important.

Currently, the interpreter supports most C# language semantics. We routinely test it against a test suite that includes many test cases, including large bodies of code from the class libraries. Mint has been useful as a prototyping testbed.

Mono's JIT
Mono's JIT translates CIL instructions into native code at run time. The JIT compiles an entire assembly in one pass, or one method at a time the first time each method is invoked.

The JIT uses a set of macros that generate code in a memory buffer. Mono needs one set of macros for each architecture. These macros simplify code generation debugging and prototyping. The code generation interface for the x86 class computer platform is in the mono/arch/x86/x86-codegen.h file. Listing One illustrates use of those macros. The x86-codegen.h macros originated in Intel's Open Research Platform Java Virtual Machine. We have converted the macros to be used from C, the language that the JIT is written in.

The conversion of CIL bytecodes into native instructions is where things get interesting. Mono uses an instruction selector based on bottom-up rewrite system (BURS) tree pattern matching — the same technology used by the portable lcc ANSI C compiler.

BURS uses a grammar that maps a set of operations (the terminal nodes) into nonterminal elements that match the target architecture. This grammar is fed into a code generator program, monoburg. For those of you familiar with Yacc, you can think of monoburg as a Yacc parser. However, you don't run screaming for the hills in the face of reduce/reduce conflicts. Instead, conflicts are seen as a good thing, and are resolved by using cost functions associated with each production. The pattern matcher's input is a tree of operations. It maps the tree to the target architecture by selecting the nodes that have the minimum total cost associated with them.

The first step transforms a sequence of CIL instructions into a forest of trees. Each tree has to be fed to the instruction selector separately. During this forest/tree creation process, the standard CIL instructions are transformed into codes that are deemed better matches by the instruction selector. That is why the BURS grammar does not actually contain real CIL opcodes, but similarly named pseudo opcodes. 

To generate code, a number of passes are performed on the forest of nodes. The first pass labels all the nodes and finds the cheapest tree, and the second pass performs register allocation. The final stage emits the x86 code. 

At this writing, the JIT engine supports most of the nonobject-oriented features of the virtual machine. By the time you read this, the object-oriented features should be implemented.

Garbage Collection
Garbage collection (GC) in Mono is based on the Intel Open Runtime Platform (ORP; http://orp.sourceforge.net/). The ORP garbage collector provides an interface that can be plugged into existing applications and provides precise GC.

One of the GC modes provided by ORP's precise GC system is a generational, copying, and precise garbage collector. It is possible to control the kind of garbage collection algorithm based on this mode.

P/Invoke
P/Invoke (Platform Invoke) is the bridge between the CLR and any platform that hosts it. Under Windows, P/Invoke lets you call into Win32 DLLs (there is a separate API for calling into COM). Under UNIX, you can use P/Invoke to call into shared libraries.

Any implementation of .NET delegates as much as possible to the underlying platform. For example, the Windows Forms API needs to draw windows and put widgets in them. Under the hood, this chore is delegated to the appropriate Win32 or GNOME APIs. Anyone implementing the .NET Framework will need to rely on P/Invoke to manage this delegation.

P/Invoke uses a combination of attributes and extern declarations to pull functions into the CLR. The DllImport attribute specifies a shared library and function, and must be attached to an extern method declaration. Example 3(a) imports the puts() function from libc.so.6, while Example 3(b) pulls in several functions from the ncurses library. Figure 1 shows the output of running this program under mint.

Beyond the CLI
Mono is currently not self hosting: The C# compiler still must be compiled on Windows using Microsoft's C# compiler. When the C# compiler can run under mint and is capable of compiling itself, the Mono development team will turn its focus to other areas. However, some progress is already being made in those areas:

Gtk#. GNOME's GUI foundation is the Gtk+ toolkit. The Gtk# classes are Mike Kestner's work on a set of Mono bindings for Gtk+. C# properties map nicely to the GtkArgument system; events and delegates propagate Gtk+ signals. Gtk# will become the foundation on which we can build desktop applications for Mono, and will also become the foundation on which the Windows Forms (System.Windows.Forms) classes will be implemented. 
Bonobo. GNOME's component system is a set of CORBA interfaces for components and compound documents. By the time we are done with Mono, you should be able to author Bonobo components in C# and make those available to the rest of the desktop with little effort, similar to what .NET does with COM under Windows. 
Rafael Teixeira has been working on an implementation of Visual Basic .NET to be integrated with the Mono Compiler Suite. Another effort will yield a free ECMAScript implementation that generates CIL. Sergey Chaban has written an IL assembler that uses System.Reflection.Emit, just as the Mono C# compiler does. He also has contributed a verifier that checks the generated output of the compiler. 
Programmers are at work on complementary projects. There is a set of OpenGL bindings for C#, and work is in progress to port the Camel mailer API to C# (Camel is similar in spirit to JavaMail). Again, Mike Krueger is implementing SharpDevelop, a free IDE written entirely in C#. SharpDevelop currently runs on Windows, but we hope to provide enough functionality in Mono to run the binary unmodified. The C# and Visual Basic parsers and integration with the .NET type system should help SharpDevelop support language-aware features (such as autocompletion in the GUI). 
Conclusion
Implementing Mono is a big task that would not be possible without the help of the many contributors (you can see a list of them at http://www.go-mono.com/). We are thankful to all the contributors who have helped get Mono where it is today, and will certainly help in its future.

We are focused on having a complete and correct platform. Optimizations are not part of our initial design goals, since it is difficult to optimize ahead of time without good performance measurements. Hopefully when we are done with the foundational pieces of Mono, we will tackle a number of interesting tasks such as an an ahead-of-time compiler that would compile assemblies for maximum execution speed. An ahead-of-time compiler can perform more expensive optimizations than JIT engine would, since there is no rush to get the code compiled.

CIL is a good platform for writing code optimizers, as the division between the language and the target are clear at the time an ahead-of-time compiler would be invoked. Various optimizations can be applied on the intermediate forest and the individual trees: Enhanced register allocation and more traditional compiler optimization techniques can be applied here; also, the use of profile-based optimization seems convenient at this point. Various peephole optimizations that we are currently missing can be performed at the grammar level and at the code emission level.

The current code generator lacks an instruction scheduler. This is mildly important for x86 machines, but is more important if Mono is to support the ia64 instruction set or other RISC chips.

DDJ

Listing One

lreg: ADD (lreg, lreg) {
  if (tree->reg1 != tree->left->reg1)
    x86_mov_reg_reg (s->code, tree->reg1, tree->left->reg1, 4);
  if (tree->reg2 != tree->left->reg2)
    x86_mov_reg_reg (s->code, tree->reg2, tree->left->reg2, 4);
  x86_alu_reg_reg (s->code, X86_ADD, tree->reg1, tree->right->reg1);
  x86_alu_reg_reg (s->code, X86_ADC, tree->reg2, tree->right->reg2);
}