I'm interested in most aspects of computer programming, in particular in programming languages, tools, and systems to make programs safer, easier to write, and more efficient. Most of my work revolves around the following topics:

Programming language design, formalization and implementation. Popular programming languages are most often unsafe, giving programmers way too much rope to hang themselves with, and/or inexpressive, forcing programmers to deploy painful, time-consuming and error-prone encodings of domain concepts in terms of language constructs. I believe that safer, more expressive programming languages based on the functional programming paradigm can tremendously improve the quality and timeliness of software. The OCaml language embodies some of my results in this area.

Type systems, module systems and related static analyses. Many common programming errors can be detected automatically by static analysis of the code. I have worked on type systems, module systems and static analyses to detect large classes of programming errors at compile-time, ensure safety properties in programs, and support software reuse and parameterization in a type-safe way.

Compilers. I have a long-standing fascination for compilers and more generally for all sorts of translations from one computer language to another. I have worked on the efficient compilation of the ML language, resulting in the OCaml system. These days, in the CompCert project, I'm having great fun proving semantic preservation for optimizing compilers using the Coq proof assistant.

Mechanized proofs. I am a firm believer in formal methods and enjoy using proof assistants such as Coq to specify and prove properties of programs, operational semantics, type systems, compilers and other tools that participate in the software development process.

The remainder of this page lists the main research projects I've been involved in, roughly in reverse chronological order.

Formal verification of compilers

Compilers sometimes contain bugs causing wrong machine code to be generated for a correct source program. This is a concern for critical software certified using formal methods: the guarantees so painfully obtained by formal verification of the source program are useless if the compiler generates wrong code. This problem can be avoided by formally verifying the compiler itself, proving that it preserves the semantics of source programs. I currently work on the Compcert project: the development and proof of correctness, using the Coq proof assistant, of a moderately-optimizing C compiler usable for critical embedded systems. This is a fascinating line of work, combining compilation, functional programming, operational semantics and program proof.

For more information:
The Compcert project
Formal verification of a realistic compiler.

OCaml

OCaml is a popular, typed functional programming language of the ML family. Besides leading the development effort for OCaml between 1995 and 2012, my main contributions were the design and implementation of its module system (a more syntactic variant of the Standard ML module system) and of its two compilers: a simple yet efficient bytecode compiler and an optimizing native-code compiler generating high-performance assembly code for a variety of platforms.

For more information: the OCaml Web site.

Mechanized semantics

The Coq proof assistant and other systems based on the propositions-as-types, proofs-as-programs paradigm provide a fascinating way to do mathematics with the assistance of the computer. Besides my work on the formal verification of compilers, for several years I have been using Coq to formalize and prove properties of programming language components such as bytecode verifiers, operational semantics and type systems. Recently, I re-discovered a coinductive approach to big-step operational semantics (natural semantics) that enables it to describe non-terminating computations quite nicely. I have also participated in the POPLmark challenge, a collective experiment in mechanizing the metatheory of type systems.

For more information:
Coinductive big-step operational semantics.
A locally nameless solution to the POPLmark challenge.

Language-based security

Circa 1995, Java applets promoted the idea of type-checking untrusted compiled bytecode before execution on a trusted device. While Java applets never really caught on, they popularized the idea that high-level programming languages and static type systems are good not only for program safety, but also for security as well -- an approach known today as language-based security. With my colleague François Rouaix, I tried to characterize the security implications of strong typing. We formalized and proved secure a few security patterns that rely on strong typing in an essential way. Later, I got interested in Java bytecode verification -- the static analysis that establishes the well-typedness of compiled Java bytecode. In the context of the Trusted Logic start-up company, I developed a variant of Java bytecode verification that runs in very low memory space and can therefore be embedded in Java smart cards.

For more information:
Security properties of typed applets.
Java bytecode verification: algorithms and formalizations.
Bytecode verification for Java smart card.

Module systems

Modular programming can be done in any language, with sufficient discipline from the programmers. However, it is facilitated if the programming language provides constructs to express some aspects of the modular structure and check them automatically. The Standard ML language introduced a powerful module systems featuring multiple views of modules and parameterized modules (called "functors"). Between 1993 and 1997, I designed, formalized and implemented within Objective Caml a variant of the Standard ML module system that is very close to an ordinary type system and uses only syntactic signatures. Later, my Ph.D. student Tom Hirschowitz and I worked on mixin modules, which extend ML modules with late binding and extensibility features reminiscent of class-based object-oriented programming.

For more information:
A syntactic theory of type generativity and sharing.
A modular module system.
Mixin modules in a call-by-value setting.

Types in compilation and static analysis

As evidenced by Fortran and C, static type checking was initially introduced in programming languages not to improve the safety of programs, but to enable the compiler to generate more efficient code: the static types of data determines their run-time representation, and the compiler-generated code for data accesses is based on these types. For quite a while, I was puzzled why such type-based compilation schemes were not used for advanced, statically-typed languages such as ML and Haskell. I eventually realized that the reason was parametric polymorphism and type abstraction, which lead to statically unknown (or partially-known) types. Between 1991 and 1994, I designed and implemented a type-directed data representation scheme for ML, which circumvents the problem with polymorphism and type abstraction through the use of type-directed coercions between multiple representations for the same data type. Experimental results were not uniformly positive, so I gave up on this idea. However, it sparked considerable interest in the type systems community, which discovered many other interesting interactions between type systems and the compilation process. (See the TIC & TLDI workshops.)

Later, my Ph.D. student François Pessaux and I worked on a type-based static analysis for uncaught exception in Caml programs. Initially, we planned to use untyped static analysis techniques such as k-CFA, but discovered that a better compromise between analysis speed and analysis precision could be obtained by using a type and effect system based on unification between row types.

For more information:
Introduction to types in compilation.
Unboxed objects and polymorphic typing.
Type-based analysis of uncaught exceptions.

Type systems

During my Master's and my Ph.D. (1988-1992), I worked on type systems and type inference, more specifically on dynamic types, on type abstraction, and on polymorphic typing of imperative features such as references, communication channels and first-class continuations.

For more information:
Polymorphic typing of an algorithmic language.
Dynamics in ML.
Abstract types and the dot notation.

Xavier Leroy

research interests

Research projects