1.1 Language-based Information Flow Analysis
A computer system generally handles considerable amount of data. It
may be directly stored in memory (e.g. a physical drive) or transit
through some network interface or interactive device. Thus, programs
running on the system potentially have access to this information, as
inputs---e.g. the program may read data stored in memory or
listen to a network interface---but also as outputs---e.g. the
program may write data to memory (appending new information to
existing one or replacing it) or emit some message on a network
interface. Then, they may violate the privacy or the
integrity of data in the system by releasing secret information or
corrupting sensitive one. That is the reason why it is mandatory in
many situations to control manipulations performed by a program in
order to ensure they fulfill some integrity or security policy.
A common solution is to provide an access control system. Roughly
speaking, this consists in attaching to every fragment of data some
access rights that specify who may read and/or write it; then, only
authorized programs are allowed to read or write sensitive
information. Such a mechanism is deployed by most operating systems,
including all UNIX variants. However, this addresses only a part of
the problem because it just controls accesses to information but does
not trace the security or integrity laws through computation: for
example, a program executed with privileged rights can read a secret
location and copy its contents to a public place. Thus, access
control mechanisms provide some protection but require the programs to
which access is granted to be trusted without any restriction.
Information flow analysis consists in statically analyzing the
source code of a program before its execution, in order to
ensure that all the operations it performs respect the security policy
of the system. In short, this requires to trace every information
flow performed by the program and to check it is legal. Such an
analysis may be formulated as a type system; this choice presents many
advantages: types may serve as a formal specification language and
offer automated verification of code---provided type inference is
available. Moreover, because the analysis may be performed entirely
at compile-time, it has no run-time cost.
Flow Caml is an extension of the Objective Caml language with a type
system tracing information flow. Its purpose is basically to allow
to write real programs and to automatically check that they
obey some security policy. In Flow Caml, usual ML types are annotated
with security levels chosen in a suitable lattice. Each
annotation gives an approximation of the information which the
expression that it describes may convey. Because it has full type
inference, the system verifies, without requiring source code
annotations, that every information flow performed by the analyzed
program is legal w.r.t. the security policy specified by the
programmer.
1.2 Relating Flow Caml to Objective Caml
Let us briefly discuss the relationship between Flow Caml and
Objective Caml [LDG+02b]. First of all, one may mention
that the Flow Caml system---including its type inference engine---is
entirely written in Objective Caml. Although some part of Flow Caml's
source code comes from that of Objective Caml, the system is
distributed as a standalone program---not just a patch on Objective
Caml---because its heart, the type inference engine, totally differs
from the original one.
Putting aside these implementation issues which do not really concern
the final user, the most important relationship between Flow Caml and
Objective Caml lies in the fact that the former handles a (large)
subset of the language of the latter. Roughly speaking, this means
that a Flow Caml program may also be read as an Objective Caml one.
However, this is not exactly true because Flow Caml type expressions
include security annotations. Flow Caml handles all the core
constructs of the Caml language, including imperative features
(references, mutable values), exceptions (with the slight difference
that exception names are no more first class values), datatypes and
pattern matching. It also features most of the module layer of the
language, including functors. However, Flow Caml does not support the
object-oriented features of Objective Caml, nor polymorphic variants
and labels. (In fact, the programming language of Flow Caml is
approximately the same as that of the now defunct Caml Special Light.)
For the reason explained above, a Flow Caml program is generally not a
valid input for the Objective Caml compiler. Nevertheless, the Flow
Caml compiler outputs legal Objective Caml code from Flow Caml code.
This allows to compile every program written in Flow Caml, using
the byte-code or the native compiler, and running it as for every
Objective Caml program. Moreover, it is possible to easily interface
a program written in Flow Caml with Objective Caml code and hence to
benefit from a large amount of existing libraries.
1.3 How to get the Flow Caml system ?
The Flow Caml system is freely available on the World Wide Web at the
following address:
The source distribution should compile on almost every UNIX machine
where recent versions of GNU Make and Objective Caml are installed
(including the Cygwin environment for Microsoft Windows). A binary
distribution for Microsoft Windows operating systems is also provided.
1.4 Theoretical background and related work
The type system implemented in the Flow Caml system for tracing
information flow has been developed by François Pottier and Vincent
Simonet and is fully presented
in [PS02, PS03]. These papers
give a formal presentation of the type algebra and typing rules for
the core of the language, that is Core ML (a l-calculus with
references, exceptions, primitives and let-polymorphism). They
also provide a correctness proof of the type system. That means that
the (non-interference) property that the system is supposed to enforce
has been formally stated and verified.
The design of a type inference engine for a system providing both
subtyping and polymorphism formed another part of the work. The form
of subtyping present in Flow Caml is generally said to be
structural. Dealing with subtyping constraints in an efficient
way requires quite subtle algorithms; they are presented (and proved
correct) in another article [Sim03]. In
fact, the type inference engine of Flow Caml has been implemented
independently of its final use and is also distributed as a separate
library [Sim02]. Hence, we hope it will be suitable for
a variety of applications.
The last step of the job consisted in integrating the information flow
analysis in the Caml language itself. This required to extend in some
way every programming construct provided by the language, including
datatype definitions, the module system, the implementation/interface
mechanism; in order to obtain a complete programming language.
To the best of our knowledge, the only other real size implementation
of a language-based information flow analysis is the Jif system
by Myers et al. [MNZZ01], based on the type system presented
in Myers' thesis [Mye99]. This prototype handles a large
subset of the Java language, which is roughly comparable to that of
Flow Caml. Sketching a comparison, one of the main differences
between Flow Caml and Jif is that, going up with the ML tradition, the
former features polymorphism and has a full type inference algorithm,
while the latter performs only local type reconstruction, in the Java
style. In particular, in Jif programs, methods arguments must be
annotated with their whole type, including the security annotations.
On the other hand, Jif provides an interesting mechanism of
dynamic labels which allows performing some checks at run-time.
This has, for the time being, no counterpart in Flow Caml.