A Tutorial

Chapter 2 A Tutorial

This chapter introduces various features of the Flow Caml language through examples, beginning with simple expressions, and finally touching upon more advanced points like the module system and the interaction with the outside world. Our goal is to provide an introduction to Flow Caml for someone who has some experience with programming in the Caml language (or possibly another functional language based on the ML type system, such as SML or Haskell). If the reader wishes to learn more about basic programming in Caml, we highly recommend the reading of the first chapter (``The core language'') of Objective Caml's tutorial [LDG⁺02a].

Excepted in the last section, we use for this tutorial the interactive toplevel, which can be started by running the flowcaml command from the shell. Under the interactive toplevel, the user types Flow Caml phrases, terminated by ;;, in response to the # prompt. The system type-checks them on the fly and prints the inferred type scheme.

We encourage the reader to run a Flow Caml toplevel while reading this tutorial in order to interactively discover the language and its type system, by testing the given examples and experimenting his/her own pieces of code. However, this tutorial is also self-contained, so it can be read off-line as well.

2.1 Security levels and data structures

In this section, we explain how ML (data)types are annotated in Flow Caml with security levels in order to describe information flow.

2.1.1 Simple types

Let us begin with this first definition:

let x = 1;;

x : 'a int

(In this tutorial, pieces of Flow Caml code are typeset in black typewriter. They are generally followed by the output produced by the toplevel in blue typewriter.) Our first example simply binds the identifier x to the integer constant 1. The toplevel answers that this constant has type 'aint. In Flow Caml, the type constructor int takes one argument, which is a security level belonging to an arbitrary lattice. These annotations allow the system to trace information flow. In the above example, the security level is a variable, 'a; as every variable appearing free in a type, it is implicitly universally quantified. Basically, this means that outside of any context, the constant 1 may have any security level.

The security level of such a constant may be specified thanks to a simple type constraint. Assume we receive three integers from different sources named Alice, Bob and Cecil (such sources are often called principals in the literature):

let x1 : !alice int = 42;;

val x1 : !alice int

let x2 : !bob int = 53;;

val x2 : !bob int

let x3 : !charlie int = 11;;

val x3 : !charlie int

In Flow Caml, each data source may be symbolized by a constant security level such as !alice, !bob or !charlie (Any alphanumeric identifier preceded by a ! is a suitable constant security level.) Initially, these security levels are incomparable points in the lattice: this means that the principals they represent cannot exchange any information. We will further on see how to allow some (see section 2.6).

The above bindings are global in the toplevel, hence you can use them in the next expressions you enter:

x1 + x1;;

- : !alice int

x1 + x2;;

- : [> !alice, !bob] int

x1 * x2 * x3;;

- : [> !alice, !bob, !charlie] int

The first expression contains information about only x1, so its security level is !alice. The sum x1 +x2 is liable to leak information about x1 and x2. Then, its security level must be greater than those of x1 and x2: [> !alice, !bob] stands for any level which is greater than or equal to !alice and !bob. This can be read as the ``symbolic union'' of these two principals. Similarly, the security level of the last expression must be greater than or equal to !alice, !bob and !charlie.

However, some programming experience in Flow Caml shows that using such explicit level constants is most of the time unnecessary: thanks to ML polymorphism, universally quantified type schemes are generally expressive enough to describe a piece of code (such as a function) w.r.t. information flow. In fact, the fundamental use of level constants appears in the interaction with external channels (e.g. file i/o or networking) and will be discussed in section 2.6 of the current tutorial. For the time being, we will only use them in a somewhat artificial way, just for guiding your intuition.

We now define a function which computes the successor of an integer:

let succ = function x -> x + 1;;

val succ : 'a int -> 'a int

Once again, the system automatically computes the most general typing for this definition, which is 'a int -> 'aint. This type means that the function succ takes as argument one integer of some level 'a and returns another integer whose security level is exactly the same: indeed the result of this function carries information about its input. Because its type is polymorphic w.r.t. the security level of the integer argument, you can apply succ on arguments of different levels:

succ x1;;

- : !alice int

succ x2;;

- : !bob int

This example is sufficient to illustrate that polymorphism on security levels is a prominent feature for type systems tracing information flow: here, in the absence of polymorphism, one have to write a specialized version of the function succ for every level it is used with.

It is worth noting that information flow is traced in a somewhat conservative way: in the underlying security model, there is an information flow from an input to an output as soon as knowing the latter reveals some information, even incomplete, about the former. Let us for instance consider the following function which computes the euclidean division of an integer by 2 thanks to a logical shift.

let half = function x -> x lsr 1;;

val half : 'a int -> 'a int

The inferred scheme for half is exactly the same as that of succ: it reflects that the result produced by half reveals some information about its input. However, this leak is only partial, because it is, for instance, not possible to completely retrieve x1 from the result of halfx1. In some situations, this may yield typings which are surprising at first sight:

let return_zero = function x -> x * 0;;

val return_zero : 'a int -> 'a int

In this example, the system detects a dependency between the input and the output of the function, although there is none: return_zero always returns zero! Roughly speaking, this is because for the system, the result of a product always leaks information about the two factors, whatever they are. Obviously, if you rewrite the function as follows:

let return_zero' = function x -> 0;;

val return_zero' : 'a int -> 'b int

you obtain a more precise statement: in the inferred type, the security level of the result of the function (denoted by the variable 'b) is not related to that of the input ('a), reflecting the absence of information flow from the latter to the former.

return_zero x1;;

- : !alice int

return_zero' x1;;

- : 'a int

In addition to integers, Flow Caml offers the usual basic datatypes: booleans, floating-point numbers and characters. Like int, the corresponding type constructors carry one security level:

let y0 = true;;

- : 'a bool

let z0 = 'a';;

val z0 : 'a char

These do not raise particular difficulties, since they behave exactly like integers w.r.t. information flow analysis. For instance, usual arithmetic functions for floating point numbers are available:

let pi = 3.14159265359;;

val pi : 'a float

let pi' = 4.0 *. atan 1.0;;

val pi' : 'a float

2.1.2 Strings

There are two flavors of character strings in Flow Caml: immutable strings (type string) mutable ones (type charray). One may wonder why we distinguish them, since in Objective Caml every string is mutable---even if it is not used as such---and everything works well. This design choice is motivated because, in type systems tracing information flow, mutable values require some particular care which increases the complexity of types (we will discuss this point in section 2.3) whereas, in many situations, strings are used without in place modification. Hence, providing a distinct type for such cases allows better typings.

Immutable string literals appear in source code between double quotes:

let s = "Flow Caml";;

val s : 'a string

As illustrated by the above example, the type constructor for immutable strings is string and it has one argument which is a security level. It naturally describes all information attached to the string. The module String provides functions for manipulating immutable strings.

2.1.3 Lists

The variables 'a, 'b, etc. we have encountered in type schemes up to now stand for levels of the security lattice. In Flow Caml, polymorphism applies naturally on whole types (as in Objective Caml) too. For instance, define the identity function:

let id x = x;;

val id : 'a -> 'a

In this scheme, the variable 'a stands for a type. Indeed, in Flow Caml, a variable appearing in a typing can be of different kinds: it may stand either for a level or for a type (in section 2.4, we will see it also may denote a row). What is more, kinds are not given explicitly by the system (and the programmer does not have to give them when he enters a type expression) because they always can be deduced from the context. The type scheme inferred for id does not involve any security annotation: it simply says that the function takes an argument of some type 'a and produces a result of the same type. For instance, if you specialize the identity function so that it applies only to integers, it will have the type 'b int -> 'bint.

In Flow Caml, the list type constructor has two arguments (while in Objective Caml it has only one). Thus, in the type ('a, 'b)list, 'a is a type variable which gives the type of the elements of the list; 'b is a level variable describing the information attached to the structure of the list. This corresponds for instance to the information leaked by testing whether the list is empty.

let l1 = [1; 2; 3; 4];;

val l1 : ('a int, 'b) list

let l2 = [x1; x2];;

val l2 : ([> !alice; !bob] int, 'b) list

As usual in ML, functions manipulating lists generally perform pattern-matching on their structure. Here is such a simple function testing whether a list is empty:

let is_empty = function [] -> true | _ :: _ -> false;;

val is_empty: ('a, 'b) list -> 'b bool

In this type, the security annotation of the boolean produced by the function, 'b does not depend on the type of the list's elements, 'a, but is the same as the level of the input list, because the function reveals information only about its structure of the list. Functions manipulating lists are often recursive, but this does not raise any particular difficulty concerning typing:

let rec length = function [] -> 0 | _ :: tl -> 1 + length tl;;

val length: ('a, 'b) list -> 'b int

The type scheme obtained for length is similar to that of is_empty: the length of the list contains some information about its structure, but not about its elements. On the contrary, a function testing whether the integer 0 appears in a list reveals information about both the structure of the list and its elements, hence its type:

let rec mem0 = function [] -> false | hd :: tl -> hd = 0 || mem0 tl;;

val mem0: ('a int, 'a) list -> 'a bool

The module List of the standard library provides usual functions operating on lists, including the following examples:

let rec rev_append l1 l2 = match l1 with [] -> l2 | hd :: tl -> rev_append tl (hd :: l2);;

val rev_append: ('a, 'b) list -> ('a, 'b) list -> ('a, 'b) list

let rev l = rev_append l [];;

val rev: ('a, 'b) list -> ('a, 'b) list

2.1.4 Options

In ML, an option is a value which may be of two different forms: either None (the empty option) or Somev, where v is another value, the content of the option. The type option behaves similarly to that of lists. It has two arguments too: in ('a, 'b)option, 'a is the type of the content of the option while 'b is the security level attached to the option itself, describing the information attached to the knowledge of its form. This is illustrated by the following functions:

let is_none = function None -> true | Some _ -> false;;

val is_none: ('a, 'b) option -> 'b bool

The function is_none tests whether an option is None, by a simple pattern matching. Thus, the security level of the obtained integer is exactly that of the option: the test is likely to leak information only about the form of the argument.

let default = function None -> 0 | Some x -> x;;

val default: ('a int, 'a) option -> 'a int

Similarly, default matches an integer option. If it is None, it returns the default value 0, and otherwise the content of the option itself. Thus, the result produced by an application of default carries information about both the form of the option and its content.

2.1.5 Tuples

Tuples of arbitrary length are also available in Flow Caml: if x1, ..., xn are values whose respective types are t1, ..., tn then (x1, ..., xn) is a tuple of type t1 * ... *tn. For instance:

let pair0 = (0, true);;

val pair0 : 'a int * 'a bool

let triple0 = (0, 1, 'a');;

val triple0 = 'a int * 'a int * 'a char

Product types carry no particular security annotation because---with the slight exception of observations made by the physical equality operator, see ?---all the information carried by a tuple is in fact carried by its components. However, each component of a tuple has its own security annotation, which may differ from those of the others. For instance, one may define a pair of integers whose first integer has level !alice and the second !bob :

let pair1 = x1, x2;;

val pair0 : !alice int * !bob int

2.2 Constrained type schemes

We will now show that Flow Caml features a constraint-based type system with subtyping. ML's type system (which is the basis of SML, Objective Caml or Haskell) relies on unification; which means that the only expressible relationship between (type) variables is equality. Unfortunately, as will be demonstrated by our next examples, this is not expressive enough to faithfully trace information flow in many cases, and then type schemes must include constraints between security levels such as inequalities.

2.2.1 Subtyping

Subtyping between security levels

Let us consider a first example of function whose type scheme comprises an inequality: f1 takes one integer x as argument and returns a pair formed of its successor and its sum with the global constant x1 defined above:

let f1 x = (x + 1, x + x1);;

val f1 : 'a int -> 'a int * 'b int with 'a < 'b and !alice < 'b

The type scheme returned by the system involves two level variables, 'a and 'b. The first one, 'a, is the security level of the function's argument. Naturally, it is also that of the first component of the pair returned by the function. The second integer returned by the function is labeled by the variable 'b. This security level is related to 'a by the first inequality appearing after the keyword with: 'a < 'b tells us that 'b must be greater than or equal to 'a (note that the character < output by your terminal stands, in Flow Caml, for the mathematical symbol £). In what concerns information flow, this inequality reflects the fact that the integer labeled by 'b depends on the one labeled 'a; in other words that there is a flow from the latter to the former. The other constraint, !alice < 'b requires 'b to be greater than or equal to the constant !alice. It says that there is a possible flow from data (namely x1) coming from the external source symbolized by the constant !alice (the principal Alice) to the second output of the function.

Now, we can apply this function to different integers:

f1 0;;

- : 'a int * !alice int

f1 x1;;

- : !alice int * !alice int

f1 x2;;

- : !bob int * [> !alice, !bob] int

From a type-theoretic point of view, the type scheme inferred for f1 means that every instance of 'a int -> 'a int * 'bint for some 'a and 'b which satisfy the inequalities 'a < 'b and !alice < 'b is a valid type for the function. This statement cannot be expressed as precisely in a unification-based type system. Indeed, in such a framework, every < must be read as =, i.e. the variables 'a and 'b must be unified with the constant !alice. Thus we would obtain the following judgment:

val f1 : !alice int -> !alice int * !alice int

which is much more restrictive than the previous one: here, applying f1 to the integer 0 would yield a result of type !aliceint (instead of 'aint), while the expression f1x2 would be ill-typed. The same observation can be made with the following function, f2, which takes three integer arguments and computes the sums of each pair of them:

let f2 x y z = (x + y, y + z, x + z);;

val f2 : 'a int -> 'b int -> 'c int -> 'd int * 'e int * 'f int with 'a < 'd, 'f and 'b < 'd, 'e and 'c < 'e, 'f

The obtained type scheme involves three constraints; each of them relates one argument of the function to two of its outputs. For instance, the constraint 'a < 'd, 'f (which is a shorthand for 'a < 'd and 'a < 'f) traces the information flow from the first argument, x to the first and third components of the result, x +y and x +z respectively. The next two constraints deal similarly with the second and third arguments of the function, respectively. Obviously, the system performs some arbitrary choice when it typesets a list of constraints. For instance, f2's scheme may equivalently be written:

val f2 : 'a int -> 'b int -> 'c int -> 'd int * 'e int * 'f int with 'a, 'b < 'd and 'b, 'c < 'e and 'a, 'c < 'f

When one applies the function to the three constants x1, x2 and x3, the constraints allow to compute the respective levels of the resulting integer:

f2 x1 x2 x3;;

- : [> !alice, !bob] int * [> !bob, !charlie] int * [> !alice, !charlie] int

Once again, if the system did not feature subtyping but only unification, f2 would have a much more restrictive typing

val f2 : 'a int -> 'a int -> 'a int -> 'a int * 'a int * 'a int

which tells only that each component of the returned tuple is likely to depend on all three arguments given to the function.

Subtyping between types

In the previous examples, inequalities involve only security levels. However, types are also ordered by the partial order <, which is said to be a subtyping order. In general terms, subtyping consists of a partial order on types and a subsumption rule that allows every expression which has a given type to be used with any greater type, i.e. if an expression e has some type t and t is a subtype of t' (t < t') then e also has type t'. In Flow Caml, subtyping is structural and defined by lifting the order between security levels throughout the structure of types: two comparable types must have the same ``structure'' and only their annotations may differ. For this purpose, every type constructor (such as int, list or ->) has a signature which gives the variance (and the kind) of each of its argument. A variance is either + (covariant), - (contravariant) or = (invariant). The signature of a type constructor can be displayed in the toplevel thanks to the directive #lookup_type:

#lookup_type "int";;

type (#'a:level) int

This tells that the only argument ('a) of int is a level and is covariant. (The # symbol is a distinguished form of +, whose role will be explained in section 2.2.2. For the time being, you can simply read it as if it were +.) This defines the subtyping order on integer types: given two security levels 'a and 'b, 'a int < 'bint holds if and only if 'a < 'b. Similarly, the two arguments of list are also covariant:

#lookup_type "list";;

type (+'a:type, #'b:level) list = ...

Then, ('a1, 'b1) list < ('a2, 'b2)list is equivalent to 'a1 < 'a2 and 'b1 < 'b2. As a result, subtyping constraints involving two type structures can be decomposed recursively: for instance ('a1 int, 'b1) list < ('a2 int, 'b2)list produces 'a1 int < 'a2int and 'b1 < 'b2 and then 'a1 < 'a2 and 'b1 < 'b2.

The function f3 takes three arguments and build three lists of two elements each:

let f3 x y z = ([x; y], [y; z], [x; z]);;

val f3 : 'a -> 'b -> 'c -> ('d, 'e) list * ('f, 'g) list * ('h, 'i) list with 'a < 'd, 'h and 'b < 'd, 'f and 'c < 'f, 'h

The typing inferred by the system is similar to that of f2. Each constraint relates the type of one input to those of the result: thus, the type of the first argument, 'a is ``injected'' in those of the elements of the first and third lists, reflecting the dependency. However, it is worth noting that here, the variables 'a, 'b, 'c, 'd, 'e and 'f are types, not levels.

The arrow type constructor -> we have encountered in the previous examples has the following signature:

type (-'a:type) -> (+'b:type)

As usual in the presence of subtyping, the result type is a covariant parameter while the argument is a contravariant one. This means that the inequality 'a1 -> 'b1 £ 'a2 -> 'b2 holds if and only if 'b1 £ 'b2 and 'a2 £ 'a1.

Simplification of type schemes

Flow Caml automatically performs some simplifications before it outputs a scheme in order to make the printing as concise as possible. Indeed, because of the presence of subtyping, the same type scheme can be written in different but equivalent forms. To illustrate this, let us consider the integer sum operator, +. In the Flow Caml standard library, it is declared with the following scheme:

val ( + ) : 'a int -> 'a int -> 'a int

This type scheme apparently constrains its two arguments to have the same security level. However, it is still possible to compute the sum of two integers which have different security levels, as in the following example:

x1 + x2;;

- : [> !alice, !bob] int

The integer x1 has the type !aliceint. By subsumption, it can be freely used with any greater type, e.g. [> !alice, !bob]int. (The system is able to perform the coercion itself when needed, no explicit annotation is therefore required.) Similarly, x2 has type !bobint but it can also be used as a value of type [> !alice, !bob]int. It follows that the expression x1 +x2 is well-typed and produces a value of type [> !alice, !bob]int. Generalizing this process, one may naturally propose another type scheme for ( +), which explicitly includes the subsumption mechanism:

val ( + ) : 'a₁ int -> 'a₂ int -> 'a₃ int with 'a₁, 'a₂ < 'a₃

Nevertheless Flow Caml tries to output every type scheme in a form that is as concise as possible (for efficiency reasons, its simplification algorithm is however incomplete). In order to help you in reading types, each occurrence of a variable is printed with a color which indicates its polarity: negative occurrences appear in green while positive ones are in red. Then, when interpreting a type, any negative (resp. positive) occurrence of a variable 'a can be replaced by a fresh variable 'b with the constraint 'b < 'a (resp. 'a < 'b). Applying this principle to the scheme

val ( + ) : 'a int -> 'a int -> 'a int

one obtains

val ( + ) : 'a₁ int -> 'a₂ int -> 'a₃ int with 'a₁ < 'a and 'a₂ < 'a and 'a < 'a₃

which becomes, by transitivity of <,

val ( + ) : 'a₁ int -> 'a₂ int -> 'a₃ int with 'a₁ < 'a₃ and 'a₂ < 'a₃

2.2.2 `level` constraints

Flow Caml provides the conditional construct if ... then ... else ..., which has the same semantics as that of Objective Caml, as well as polymorphic comparison primitives. As explained above, the type of boolean values carries one security level:

let y1 : !alice bool = false;;

val y1 : !alice bool

let y2 : !bob bool = false;;

val y2 : !bob bool

When it encounters a conditional construct, the execution of a program evaluates the condition, and depending on the result, continues with one of the two branches. Thus, the result produced by the whole expression is that of one of the two sub-expressions appearing after then and else. Hence, the type of the former must be a super-type of the latter. For instance, in the simple case where a conditional produces integers, this means that the security level of the whole expression must be the union of those of the two branches:

if y0 then x1 else x2;;

- : [> !alice, !bob] int

In this example, x1 has type !aliceint, x2 has type !bobint, so the whole expression has type [> !alice, !bob]int. The value produced by a conditional also carries information about the result of the test. Hence, in order to take in account this possible information flow, the security level of the latter must guard the type of the former, this means that its security level(s) must be greater than or equal to that of the condition:

if y1 then 1 else 0;;

- : !alice int

Here, the condition, y1, has level !alice; hence the result of the whole expression must have this level too. Similarly, if a conditional evaluates to a tuple, the type of each of its components must be guarded by the level attached to the test:

if y1 then (x1, (true, 'a')) else (x2, (false, 'b'));;

- : [> !alice, !bob] int * (!alice bool * !alice char)

We now introduce some functions whose result depends on some test(s) performed on their argument(s). The function int_of_bool simply converts a boolean into an integer:

let int_of_bool x = if x then 1 else 0;;

val int_of_bool : 'a bool -> 'a int

However, because of polymorphism, it is possible for the type produced by a conditional construct not to be known, for instance if it depends on that of some argument in an abstraction:

let choose y1 y0 x = if x then y1 else y0;;

val choose : 'a -> 'a -> 'b bool -> 'a with 'b < level('a)

The result produced by choose clearly depends on the value of the boolean x, and hence must be guarded by its security level, 'b. However, it has the type of the two other arguments of the function, y0 and y1, but this may be arbitrary. That is the reason why this example involves a new form of constraint, 'b < level('a). (In [PS03], this is written 'b

'a.) Let us first remark that in this constraint 'b and 'a are variables of different kinds: 'b stands for a security level while 'a is a type. This constraint can be viewed as an inequality delayed until the structure of the type 'a is known: roughly speaking, it means that the topmost security level(s) of the type 'a must be greater than or equal to 'b. For instance, if you instantiate 'a by 'a1int, the constraint is simply decomposed as 'b < 'a1, but, if you instantiate 'a by 'a1 int * 'a2bool, it produces 'b < 'a1 and 'b < 'a2. To illustrate how this decomposition mechanism works, one can consider some partial applications of the function:

let choose1 y = choose 1 0 y;;

val choose : 'a bool -> 'a int

let choose2 y = choose (1, 1) (0, 0) y;;

val choose : 'a bool -> 'a int * 'a int

In the first example, 'a is instantiated by 'a1int and the constraint 'b < level('a1 int) is decomposed as 'b < 'a1. This yields the scheme

'b bool -> 'a1 int with 'b < 'a1

which is simplified into 'a bool -> 'aint. In the second example, 'a is instantiated by 'a1 int * 'a2int. The constraint becomes successively

'b < level('a1 int * 'a2 int)'b < level('a1 int) and 'b < level('a2 int)'b < 'a1 and 'b < 'a2

The obtained scheme can be simplified into 'a bool -> 'a int * 'aint. Similarly, we can also consider lists:

let choose3 y = choose [] [1;2] y;;

val choose3 : 'a bool -> ('b, 'a) list

Here, the type variable 'a is instantiated by ('a1, 'a2)list, which yields the constraint 'b < level(('a1, 'a2) list) that is decomposed into 'b < 'a2.

A question naturally arises: how does the type-checker determine on which arguments of each type constructor the destructor level must be decomposed? This information is retrieved from signatures: the arguments on which level applies are those which are marked as ``guarded'' by a sharp symbol (#):

#lookup_type "int";;

type (#'a:level) int

#lookup_type "list";;

type (+'a:type, #'b:level) list = ...

This means for instance that level applies on the single argument of int while it considers only the second one of list. It is worth noting that # is a distinguished form of +, that means that guarded arguments are always covariant.

2.2.3 `content` constraints

Flow Caml supports the polymorphic comparison primitives of Objective Caml, such as = or <=. These operators can be used to compare data structures of any type, so, in Objective Caml, they have the following type

'a -> 'a -> bool

which means they expect two arguments of the same type and returns a boolean. In Flow Caml, the type of the boolean result must carry a security annotation, say the level variable 'b. Moreover, because, the result produced by the operator is liable to carry information about the two compared values, 'b must be related to the security levels which describe them, i.e. those that appear within the type 'a. For instance, specialized versions of the equality for integers, pairs of integers, lists of integers and lists of pairs of integers should have the following type schemes:

val eq_int : 'a int -> 'a int -> 'b bool with 'a < 'bval eq_int_pair : ('a1 int * 'a2 int) -> ('a1 int * 'a2 int) -> 'b bool with 'a1, 'a2 < 'bval eq_int_list : ('a1 int, 'a2) list -> 'b bool with 'a1, 'a2 < 'bval eq_int_pair_list : ('a1 int * 'a2 int, 'a3) list -> 'b bool with 'a1, 'a2, 'a3 < 'b

Indeed, comparing two pairs can leak information about each member of each pair while comparing two lists gives some knowledge about the structure of the lists (e.g. their length) and/or their elements. Similarly, an equality operator which applies to integer references has the following type:

val eq_int_ref : ('a1 int, 'a2) ref -> ('a1 int, 'a2) ref -> 'b with 'a1, 'a2 < 'b

Comparing two references reveal information about their respective addresses---hence 'a2 < 'b---and their contents, hence 'a1 < 'b.

We observe that in all cases, the security level 'b that labels the boolean produced by the comparison must be greater than or equal to every security level that appears in the type of the arguments. This reflects how comparison applies recursively on data-structures. Thus, in order to give a principal type to these polymorphic operators, we need an additional form of constraint, content('a) < 'b where 'a is a type and 'b is a level. (In [PS03], this is written 'b

'a.) This constraint requires every security annotation of the type 'a to be less than or equal to the security level 'b. For instance, content('a1 int * 'a2 int) < 'b is equivalent to 'a1 < 'b and 'a2 < 'b while content('a1 ref, 'a2) < 'b stands for 'a1 < 'b and 'a2 < 'b. This definition mimics the behavior of generic comparison operators which traverse data structures recursively. Then, in Flow Caml, = and <= have the following type:

val ( = ) : 'a -> 'a -> 'b bool when content('a) < 'b

val ( <= ) : 'a -> 'a -> 'b bool when content('a) < 'b

They can be used to implement a polymorphic function mem which searches whether an element is amember of a list:

let rec mem x = function [] -> false | hd :: tl -> (x = hd) || mem x tl;;

val mem : 'a -> ('a, 'b) list -> 'b bool with content('a) < 'b

or an insertion sort on lists:

let rec insert x = function [] -> [x] | hd :: tl -> (min hd x) :: insert (max hd x) tl;;

val insert : 'a -> ('a, 'b) list -> ('c, 'b) list with 'a < 'c and content('a) < level('c)

let rec sort = function [] -> [] | hd :: tl -> insert hd (sort tl);;

val sort : ('a, 'b) list -> ('c, 'b) list with 'a < 'c and content('a) < level('c)

In Objective Caml, it is even possible to apply polymorphic comparison primitives to functional values: for instance, if f1 and f2 are two functions, (f1 = f2) either returns true (if the two functions have the same memory address) or raises an exception in all other cases. Such an expression seems to have a very limited interest and is not really used because it largely depends on the implementation: for instance let f = fun x -> x in (f = f) returns true while (fun x -> x) = (fun x -> x) raises an exception. However, the Caml type system has no way to prevent such calls from arising. The SML [MTHM97] dialect of ML addresses these issues by introducing ``eq'' types, and hence refuses at compile time any application of a comparison primitive to values which (are likely to) contain closures. The same approach is followed in Flow Caml, where non-eq types are marked by the keywork noneq in their definition, and the constraint content('a) < 'b cannot be satisfied if 'a is a non-eq type. Hence, the following piece of code yields a type error:

(fun x -> x) = (fun x -> x);;

Magic generic primitives cannot be applied on expressionsof type ~a -> ~a

2.2.4 Same-skeleton constraints

In addition to inequalities, Flow Caml type schemes may involve another sort of constraints, which are referred to as same-skeleton constraints. Let us consider the following function:

let skel x y = if x = y then (); x;;

val skel : 'a -> 'b -> 'a with 'a ~ 'b

skel xy tests whether x equals y, then returns x. In the case where the test succeeds, the function skel does nothing particular, but it should for instance be possible to replace () by an expression which performs side-effects, as we will do in section 2.3.2. However, the current function is sufficient to illustrate the need of same-skeleton constraints.

In Objective Caml, the two arguments of skel, x and y, will be required to be of the same type, in order to allow comparing them: skel's principal type scheme would be 'a -> 'a -> 'a. However, in Flow Caml, thanks to subtyping, it is no longer necessary to require them to have exactly the same type: indeed, they may have different security annotations, e.g. be two integers of different security levels. Formally, if x has type 'a and y has type 'b, it is sufficient to require the existence of a super-type 'c of 'a and 'b (i.e. such that 'a < 'c and 'b < 'c). This is what expresses the ~ constraint. Indeed, the above type scheme is equivalent to:

val skel : 'a -> 'b -> 'a with 'a < 'c and 'b < 'c

where 'c is an extra type variable. It is easy to check that such a 'c exists if and only if 'a and 'b are two types of the same shape or skeleton i.e. differ only by their non-invariant security annotations.

It is worth noting that the ~ predicate is transitive and associative (it is indeed the symmetric, transitive closure of <), so that same-skeleton constraints which involve a common variable can be merged, as in the following example:

let skel3 x y z = if x = y or y = z then (); x;;

val skel3 : 'a -> 'b -> 'c -> 'a with 'a ~ 'b ~ 'c

2.2.5 Functions as values

Flow Caml is a functional language: functions are first class citizens and hence can be manipulated as regular values. Thus, one may define a function whose result itself is a function:

let pred x = x + 1;;

val pred : 'a int -> 'a int

let pred_or_succ y = if y then pred else succ;;

val pred_or_succ : 'a bool -> 'b int -{|| 'a}-> 'b int

(To help comprehension, the inferred type scheme may be parenthesized as follows:

val pred_or_succ : 'a bool -> ('b int -{|| 'a}-> 'b int)

However, this is naturally unnecessary because the arrow type constructor is right associative.) Knowing which function among pred or succ an application of pred_or_succ returns naturally leaks information about the boolean given as argument. In order to reflect this information flow, the type assigned to the function returned by pred_or_succ comprises an additional security level, 'a, printed inside the arrow symbol: it intends to describe how much information is attached to the knowledge of the function. For instance, an application of pred_or_succ with a boolean of level !alice yields a function whose identity has level !alice too:

pred_or_succ y1;;

- : 'a int -{|| !alice}-> 'a int

The language naturally allows to observe the ``identity'' of a function by watching the result produced by some application of it. For instance, when one applies (pred_or_succ y1) to some integer, the result must be guarded by the level !alice, because it allows determining whether the function was pred or succ and hence the boolean y1.

(pred_or_succ y1) 0;;

- : !alice int

(pred_or_succ y1) x2;;

- : [> !alice, !bob] int

In fact, arrows in Flow Caml involve three security annotations; then, the general form of a function type is

'a -{'b | 'c | 'd}-> 'e

where 'a and 'e are the types of the argument expected by the function and the result it produces, respectively. Furthermore, 'b and 'd are levels. The former is a lower bound on the side effects performed by the function (it will be introduced in section 2.3) while the latter represents information about the function's identity, as explained above. Lastly, 'c is a row describing the exceptions the function may raise (we will detail its usage in section 2.4). However, in order to improve readability, Flow Caml does not print annotations on arrows that carry no information, i.e. that are universally quantified and unconstrained type variables. For instance 'a -> 'b is a shorthand for 'a -{'c | 'd | 'e}-> 'b (where 'c, 'd and 'e are fresh variables), while 'a -{|| 'b}-> 'c stands for 'a -{'d | 'e | 'b}-> 'c (where 'd and 'e are fresh).

2.2.6 Interlude: the graphical output of type schemes

Flow Caml's toplevel is able to give a graphical representation of type schemes, in addition to the standard textual one. The graphical output is enabled when the toplevel is launched with the -graph option, or---at any time---by entering the following directive in the toplevel:

#open_graph;;

The graphical representation of a type scheme may be easier to interpret than its textual counterpart, because it gives a visual description of information flow. Let us explain how to read such representations on some examples.

let succ x = x + 1;;

val succ : 'a int -> 'a int

The graphical representation of a type scheme is made of two parts: at the bottom appears the skeleton of the scheme, which consists in the type expression where security annotations have been replaced with bullets ·. Roughly speaking, ignoring these bullets, this can be read as a Caml type. A color code is adopted for drawing security annotations: contravariant annotations (which stand for inputs) appear in green while covariant ones (outputs) are drawn in red. Invariant annotations are in orange. Then subtyping constraints are represented in the top part of the drawing by arrows. In succ's type scheme, the dashed arrow from the green bullet to the red one symbolizes an inequality whose left- (resp. right-) hand-side is the security annotation symbolized by the green (resp. red) bullet. Then, the drawing must be read as the following scheme

val succ : 'a int -> 'b int with 'a < 'b

which is equivalent to 'a int -> 'aint. Let us now show how type variables are graphically represented.

let skel x y = if x = y then (); x;;

val skel : 'a -> 'b -> 'a with 'a ~ 'b

In the body of the graphical representation of skel's type scheme, the boxes labeled ~a stand for a skeleton class: each occurrence of ~a must be read as a different type variable a₁, a₂, ..., a_n, with the constraint a₁ ~ a₂ ~ ... ~ a_n. For instance, ~a -> ~a -> ~a represents

'a₁ -> 'a₂ -> 'a₃with 'a₁ ~ 'a₂ ~ 'a₃

The same color code is followed for boxes as for bullets. Subtyping constraints between type variables are represented by black arrows: in skel's type scheme, the arrow from the first box to the third one stands for the constraint 'a1 < 'a3.

Arrows heads and tails can be shared, as in the following examples:

let sum x y = x + y;;

val sum : 'a int -> 'a int -> 'a int

let make_pair x = (x, x);;

val make_pair : 'a -> 'a -> 'a

It remains to show how special forms of inequalities, i.e. constraints such as content('a) < 'b, 'a < level('b) and level('a) < content('b), are drawn. All of them are represented by a dashed arrow from (the box/bullet which stands for) 'a to (the box/bullet which stands for) 'b. No confusion can arise thanks to kinding as illustrated by the following table:

kind of 'a kind of 'b meaning of a dashed arrow from 'a to 'b

level level 'a < 'b

level type 'a < level('b)

type level content('a) < 'b

type type content('a) < level('b)

This is illustrated by the two following examples:

let choose y1 y0 x = if x then y1 else y0;;

val choose : 'a -> 'a -> 'b bool -> 'a with 'b < level('a)

( = );;

val ( = ) : 'a -> 'a -> 'b bool with content('a) < 'b

In the type scheme of choose, the dashed arrow symbolizes the constraint 'b < level('a), while in that of ( =) it stands for content('a) < 'b.

2.3 Imperative features

Though all the examples given so far in this tutorial are in a ``purely functional'' style, Flow Caml also provides full imperatives features. This includes mutable data structures such as references and arrays, as well as usual while and for loops.

2.3.1 Direct and indirect information flows

Unfortunately, in a programming language equipped with side effects, it is possible to leak information in indirect ways. Let us consider the following pieces of code:

r := not yr := if y then false else trueif y then r := false else r := truer := true; if y then r := false

All of them are semantically equivalent: they update the content of the reference r, storing in it the negation of the boolean y. Hence, this produces some information flow from y to r. However, depending on the cases, it is of a different nature. In the two first examples, the flow is said to be direct: a value depending from y is computed and then stored in r; this is very similar to what we have encountered up to now. On the contrary, in the last two expressions, the value in every right-hand-side of the := operator does not involve y: it is even given explicitly in the source code. However, the reference's update is performed in a branch of the program whose execution is conditioned by the value of y. In this situation, we say there is an indirect flow form y to r. The last example calls for an additional comment: in the case where the boolean y is false, the reference r is never updated in a context conditioned by y. However, the information flow from the latter to the former still exists: it is indeed possible to leak information through the absence of a certain effect. (This last example shows that it would be very difficult to detect information flow at run time.)

2.3.2 References

In Flow Caml, the type constructor for references, ref, has two arguments:

#lookup_type "ref";;

type (='a:type, +'b:level) ref = ...

The first one is the type of the value stored in the reference. Because the content of a reference is accessible in reading and writing, it must be at the same time covariant and invariant, i.e. it is invariant. The second argument of ref is a security level, which is guarded and covariant. It describes how much information is attached to the identity of the reference, in other words its memory address.

Let us now illustrate how information flow with mutable structures is traced by some examples of Flow Caml code. We first define two references r1 and r2 whose contents are declared to be booleans of levels !alice and !bob, respectively.

let r1 : (!alice bool, 'a) ref = ref true;;

val r1 : (!alice bool, 'a) ref

let r2 : (!bob bool, 'a) ref = ref true;;

val r2 : (!bob bool, 'a) ref

In the above example, the content of reference r1 has type !alicebool. This means it may receive any boolean whose security level is less than or equal to !alice, i.e. a boolean Alice is allowed to read. The boolean y1 (defined in section 2.1) has level !alice. Hence it can legally be stored in r1:

r1 := y1;;

- : unit

This expression only produces a side-effect, so it has type unit. Because there is only one value of this type, the constant (), the value of a unit expression yields no information. At a result, the unit type constructor does not carry any security annotation. On the contrary, the boolean y2 has been declared with the level !bob. Because information flow from !bob to !alice is not allowed (see section 2.6), assigning it to r1 raises a typing error:

r1 := y2;;

This expression generates the following information flow: from !bob to !alicewhich is not legal.

Similarly, the reference r1 can be updated in a context whose execution depends on y1 but not y2:

if y1 then r1 := false else r1 := true;;

- : unit

if y2 then r1 := false else r1 := true;;

This expression generates the following information flow: from !bob to !alicewhich is not legal.

Lastly, reading the content of r1 naturally yields a boolean of level !alice:

!r1;;

- : !alice bool

Let us explain in a few words how the type system is able to trace indirect information flow in the above examples. Flow Caml associates to every context of an expression (i.e. every point of the program) a security level telling how much information the given sub-expression gains when it is executed. (In the literature, this security level is generally written pc, in reference to program counter.) Basically, each time a conditional construct is traversed, this level is augmented with the annotation of the condition, as illustrated in this example:

if y1 (* y1 has type !alice bool *) then ... (* this branch is typechecked at level !alice *)else if y2 (* y2 has type !bob bool *) then ... (* this branch is typechecked at level [> !alice, !bob] *) else ... (* this branch is typechecked at level [> !alice, !bob] *)

Moreover, when some data is written in a reference, the system constrains the level of its content to be greater than or equal to the security level attached to the context, reflecting the fact that, because of the update, the content of the reference is liable to carry information about all the tests traversed to reach this point of the program.

The purpose of the second argument of the type constructor ref appears when a reference is used as first class value, e.g. if it is the result of some function. For instance, let us define a version of the function choose specialized for references by a type constraint:

let choose_ref y r1 r0 : (_, _) ref = if y then r1 else r0;;

val choose_ref : 'a bool -> ('b, 'a) ref -> ('b, 'a) ref -> ('b, 'a) ref

The security level of the reference returned by this function must be greater than or equal to that of the boolean given as argument. Indeed, revealing which reference is returned by the function may leak information about the value of the condition, y. Such an observation can be performed, for instance, by updating its content.

Some additional difficulty arises when one defines a function performing side-effects: because the body of a function is executed at the point of the program where it is applied ---and not the one where it is defined--- it must be type-checked at the level of the former rather than the latter. This is the purpose of the first security annotation appearing in function types (see section 2.2.5). For instance, consider a function which sets the content of r1 to false:

let reset_r1 () = r1 := false;;

val reset_r1 : unit -{!alice ||}-> unit

This function can only be executed in a context whose level is less than or equal to !alice. This is reflected by the annotation !alice printed ``inside'' the arrow symbol of the above type: this security level is a lower bound on the effects performed by the function and an upper bound on the contexts where it can be applied. In many cases, it is a variable related to (parts of) the type of the function's argument:

let reset r = r := false;;

val reset : ('a bool, 'a) ref -{'a ||}-> unit

The function reset takes a reference as argument and sets its content to false. The type system constrains the level of the content of the reference to be equal to or greater than (1) the level attached to the reference's identity and (2) the level attached to the context where the function is applied.

The identity of the reference returned by this function carries information about those of the references given as argument, but also about the boolean y. This is reflected in the inferred scheme by the fact that all of them are annotated by the same security level, 'a.

We now re-implement the function calculating the length of a list, length, in imperative style:

let length' list = let counter = ref 0 in let rec loop = function [] -> () | _ :: tl -> incr counter; loop tl in loop list; !counter;;

val length' : ('a, 'b) list -{'b ||}-> 'b int

The obtained scheme appears more restrictive than length's type:

val length: ('a, 'b) list -> 'b int

Indeed, with length', the result's security level must be greater than or equal to the function's pc parameter. However, the difference is only superficial; it can be checked that both types in fact have the same expressive power.

2.3.3 Arrays, strings and loops

We conclude this section by a few words about arrays and (mutable) strings. In Flow Caml, the type constructor for arrays (array) carries two arguments:

[|0; 1; 2|];;

- : ('a int, 'b) array

The respective roles of these arguments are similar to those of ref: the former is the type of the content of the cells of the array, and the latter is a security level, related to the array identity. A slight novelty is that this comprises information attached to the length of the array. Indeed, the function returning the length of an array has the following type:

Array.length;;

- : ('a, 'b) array -> 'b int

(which is similar to that of the function calculating the length of a list.)

In the Caml language, the only difference between a mutable string and an array of characters concerns its representation in the runtime system. Hence, the type of mutable strings is isomorphic to that of an array of characters:

[|'a'; 'b'; 'c'|];;

- : ('a char, 'b) array

"abc";;

- : ('a, 'b) charray

Indeed, the type constructor charray expects two security levels as arguments. The first one describes information attached to the characters stored in the string while the second one is related to the identity of the string (including its length).

2.4 Dealing with exceptions

In this section, we explain how Flow Caml deals with exceptions. For the programmer, exceptions are a powerful mechanism for signaling and handling exceptional conditions. As in Objective Caml, exceptions names are declared with the exception construct and signaled with the raise operator:

exception X;;

exception X

exception Y;;

exception Y

raise X;;

- : 'a

However, the exception machinery provided by Flow Caml is slightly restricted in comparison with that of Objective Caml, mostly because exceptions are not first class values. Basically, an exception name (such as X in the above example) is not a value, and hence cannot be bound to a variable or passed as argument to a function (while in Objective Caml, it is a legal value of type exn). Similarly, in Objective Caml, raise is a regular function which accepts an arbitrary argument (of type exn), but, in Flow Caml, it is a built-in construct which requires the name of the raised exception to be statically specified. For instance, the following Objective Caml piece of code cannot be written in Flow Caml:

let f x = raise (if x then X else Y);;

but, in this particular case, it may be rewritten into:

let f x = if x then raise X else raise Y;;

Although it should theoretically be possible to deal with exceptions as first class citizens in Flow Caml [PS02], we believe our design choice to be a good balance between expressiveness and simplicity: having first class exceptions would generate complex typings (which involve conditional constraints), whereas, according to our experience, the use of exceptions as values in real programs seems to be rather limited. To mitigate the loss in expressiveness and provide alternatives for the most common usages of exceptions as first class values made in Caml programs, Flow Caml provides two additional constructs for handling exceptions: try ...finally and try ...propagate (see section 2.4.3).

2.4.1 Rows

The exceptions that an expression is likely to raise are traced in Flow Caml's type system using a row. A row is a mapping from exception names to security levels: for every exception name, it tells how much information is leaked if the related expression effectively raises an exception of this name. Because the set of exception names is open (in the sense that the programmer can incrementally define an arbitrary number of them), rows must range over all potentially definable exceptions names; hence they are infinite objects. So, in order to allow denoting them in a finite concrete syntax, Flow Caml uses row variables and adopts Rémy's row syntax. For instance, the (row) expression X: 'a; Y: 'b; 'c stands for the row which maps the exception name X to 'a, Y to 'b and whose other entries are given by 'c. Here, 'a and 'b are levels while 'c is a row variable of domain X,Y: it stands for a row ranging over all exception names except X and Y. The order in which fields appear is not significant: the above row is equal to Y: 'b; X: 'a; 'c. Row variables can appear in constraints: the subtyping order is extended point-wise to rows. Indeed, if 'c1 and 'c2 are two row variables of the same co-domain, the constraint 'c1 < 'c2 means that every entry of 'c1 must be less than or equal to the corresponding one in 'c2. Hence, constraints involving expanded row terms may be decomposed: X: 'a1; Y: 'b1; 'c1 < X: 'a2; Y: 'b2; 'c2 is equivalent to 'a1 < 'a2 and 'b1 < 'b2 and 'c1 < 'c2. Lastly, for the sake of conciseness, when it prints a type scheme, Flow Caml omits unconstrained universally quantified row variables: for instance, A: 'a; Y: 'b stands for A: 'a; Y: 'b; 'c where 'c is a fresh row variable.

Because exceptions constitute an observable form of result for functions, they must be taken in account in their types. Let us for instance define a simple function which raises the exception X:

let raise_X () = raise X;;

val raise_X : unit -{'a | X: 'a |}-> 'b

The second annotation appearing on the arrow is a row describing the exceptions that the function is likely to raise when it is called. Here, X: 'a, tells that the given function may raise an exception of name X: catching this exception leaks information about the context where the function is called, so the security level associated to X is constrained to be at least that of the context where the function is applied (which appears as usual in first place in the arrow). In the following example,

let raise_X' y = if y then raise X;;

val raise_X' : 'a bool -{'a | X: 'a |}-> unit

catching the exception X gives information about both the context where raise_X' has been applied and the boolean argument given to the function. Thus, the annotation associated to the entry X in the row of this function must be greater than or equal to the security levels of both.

When a function is likely to raise exceptions of different names, its row comprises one entry for each of them:

let raise_X_or_Y x y = if x then raise X; if y then raise Y;;

val raise_X_or_y : 'a bool -> 'b bool -{'a | X: 'a; Y: 'b |}-> unit with 'a < 'b

The type scheme inferred by the system distinguishes one security level for each exception name: handling X yields information only about the first argument, x; while handling Y about both.

Let us now define a function which takes an integer as argument, raises X if it is zero and returns false otherwise:

let test_zero x = if x = 0 then raise X; false;;

val test_zero: 'a int -> {'a | X: 'a |}-> 'b bool

The inferred type schemes states that the boolean returned by the function does not depend on its argument. Indeed, if the function effectively produces a value, it is invariably false. However, this function can reveal information about its argument through its effect. This is reflected by the security level associated to the exception X in its type: it must be greater than or equal to the levels of the context where the function is applied and the integer argument.

Exceptions can be trapped with the try ...with construct.

try test_zero x1with X -> true;;

- : !alice bool

In this example, test_zerox1 is liable to raise an exception X with the level !alice, which will be catched by the handler try ... with X->. Thus, the value produced by the whole construct must be guarded by the level of the handled exception, i.e. !alice. Let us embed this piece of code in a function:

let f5 x = try test_zero x with X -> true;;

- : 'a int -{'a ||}-> 'a bool

The type scheme output by the system reflects that the output of f carries information about its argument, but also about the context where the function is called although it does not. However, we witness the same phenomenon as for side effects: once again this is only a superficial difference with the typing obtained for the function written in a direct style:

let f6 x = if x = 0 then true else false;;

val f6 : 'a int -> 'a bool

The with part is actually a kind of pattern-matching on exception names (however, this is not a regular pattern matching since exceptions are not values). In particular, one try ..with construct can catch several exceptions names (or even all of them using the _ pattern), as illustrated by the following example:

let f7 x y = try raise_X_or_Y x y; 0 with X -> 1 | Y -> 2;;

val f7 : 'a bool -> 'a bool -{'a ||}-> 'a int

Many functions in the standard library raise an exception when they cannot complete normally. For instance, the integer division yields Division_by_zero when its second argument is zero, as reflected by its type:

val ( / ) : 'a int -> 'b int -{'c | Division_by_zero: 'c |}-> 'a int with 'b < 'c, 'a

It is noticeable that there is no relationship between the level of the first argument and that of the exception Division_by_zero. This reflects that this operator does not need to match its first argument before raising the exception.

For the purpose of obtaining a precise information flow analysis, the evaluation order of expressions must be specified. As a result, the right-to-left evaluation order of the current implementation of the Objective Caml language is made part of the specification of the Flow Caml core language. For instance, Flow Caml type system takes into account that the arguments passed to a function are evaluated from left to right:

let f8 x y = (if x = 0 then raise X else x) + (if y = 0 then raise Y else y);;

val f8 : 'a int -> 'b int -{'c | X: 'd; Y: 'e |}-> 'f int with 'b < 'd, 'e, 'f and 'c < 'd, 'e and 'a < 'd, 'f

The inferred type scheme reflects that the condition y =0 is considered before x =0, so the exception X carries information about x and y while Y only about y. Assuming a left-to-right evaluation order, one would obtain the following scheme, where the roles of the variables 'a and 'b are exchanged:

'a int -> 'b int -{'c | X: 'd; Y: 'e |}-> 'f intwith 'b < 'd, 'fand 'c < 'd, 'eand 'a < 'd, 'e, 'f

Lastly, if the evaluation order were not specified, the type system would have to consider every possible strategy. In this case, the type scheme for f8 would be much less informative about the function:

'a int -> 'a int -{'b | X: 'b; Y: 'b |}-> 'a intwith 'a < 'b

Indeed, the security levels associated to the two exceptions are no longer differenced and must both be greater than or equal to those of the two integer arguments.

2.4.2 Exceptions and side-effects

When, in a toplevel phrase, an exception is raised outside of any handler, it will not be trapped and hence the program must terminate. As a consequence, if they gain control, the next phrases of the program observe that the exception was not raised; and this must be taken in account in the type system.

test_zero x1;;

- : 'a boolCurrent evaluation context has level !alice

In this example, if x1 is zero, evaluating this toplevel phrase causes the program to terminate. Hence, if this does not happen and execution continues, the remaining expressions receive some information about x1 when they are evaluated. Therefore, they must be type-checked in a context augmented with the security level of the information carried by x1, i.e. !alice. This point is expressed by the second line output by the system. Thus, all side-effects performed afterward by the program must affect data of levels greater than or equal to !alice. For instance, the reference r1 (whose content has level !alice) can be updated while r2 (whose content has level !bob) cannot:

r1 := false;;

- : unit

r2 := false;;

This expression is executed in a context of level !alicebut has an effect at level !bob.This yields the following information flow: from !alice to !bobwhich is not legal.

Similarly, if one feeds test_zero with x2, the level of the toplevel context is increased by !bob:

test_zero x2;;

- : 'a boolCurrent evaluation context has level !bob, !alice

and now, the reference r1 can no longer be updated, because information flow from !alice to !bob is not allowed (i.e. !alice is not inferior to !bob).

In a regular program, any increase of the security level associated with the topmost evaluation context is naturally irremediable. However, in the interactive toplevel, for the convenience of the user, it is possible to reset it to its initial value---as if a new program was started---by the simple directive:

#reset_context;;

Level of evaluation context reset

The previous examples illustrate how the security level attached to the evaluation context increases through a sequence of toplevel phrases. Sequencing of statements is also possible inside expressions, thanks in particular to the ; operator. Let us for instance consider the following function:

let f9 x r = if x = 0 then raise X; r := false;;

val f9 : 'a int -> ('b bool, 'b) ref -{'a | X: 'a |}-> unit with 'a < 'b

f9 takes two arguments: an integer x and a boolean reference r. If the integer is 0 then the exception X is raised, and the following statement is not performed. Otherwise, execution continues and the reference r is set to false. We now explain the typing inferred by the system, which reflects the two possible observable effects of the function. First of all, it may raise the exception X. The security level attached to this effect, 'a, must be greater than or equal to that attached to the context where the function is applied and that of the integer argument, because the exception raising is conditioned by a test on x. The second effect that the function is liable to have is the update of the reference r. Observing it gives information naturally about the context where f has been called, but also reveals whether the exception X has been raised, and, as a consequence, about the integer x. That is the reason why, the security level of the content of the reference, 'b, must be greater than the levels of the context where the expression is applied and the first argument, as reflected by the constraint 'a < 'b.

Similarly, every side-effect performed in an exception handler must have a level greater than or equal to that of the trapped exception. If one rewrites f9 into the following, the inferred type scheme is similar:

let f10 x r = try if x <> 0 then raise X; () with X -> r := false;;

val f10 : 'a int -> ('a bool, 'a) ref -{'a ||}-> unit

Indeed the reference update, r :=false is increased by the security level associated to X in the expression between try and with.

2.4.3 The special constructs: `try ... finally` and `try ... propagate`

In addition to the traditional try ...with, Flow Caml features two other ways of handling expressions. Two reasons have motivated their introduction: firstly, they partially counterbalance the loss of expressiveness resulting of our decision to make exception names second-class citizens; secondly, they allow a more precise typing (w.r.t. information flow) of common idioms.

Each clause appearing in the with part of a try ...with may be terminated by the keyword propagate. In this case, the exception trapped by the handler is re-raised at the end of its execution. For instance:

try ewith X | Y -> e'; propagate

evaluates e. If it raises X or Y then e' is executed and, then, the trapped exception, X or Y, is raised again. In Objective Caml, this can be implemented by binding the exception to an identifier:

try ewith X | Y as x -> e'; raise x

In addition to the fact that this is not possible with the second-class exceptions of Flow Caml, providing a dedicated idiom for this idiom allows a fine-grained typing: the exception catched by the handler is propagated with exactly the same level than it was in e. In particular, two different levels may be associated to X and Y.

The try ...finally construct of Flow Caml is a translation of the Java's construct for the Caml language. Indeed

try e1finally e2

first, evaluates e1, which yields either a regular value or an exception. In both cases, e2 is executed, and the result produced by e1 is returned. Once again, this can be encoded in regular Caml (without even using exception values):

try let r = e1 in e2; rwith exn -> e2; raise exn

(assuming e2 does not raise any exception). However, using the dedicated construct try ... finally allows better typings (w.r.t. information flow): this makes explicit that the expression e2 is always executed (whether e1 raises an exception or not). Thus, the type system is able to take this in account and type-checks e2 in a context whose level is not altered by those of the exceptions possibly raised by e1, whereas, in the proposed encoding, it is. For instance the following piece of code is accepted by the type system:

try if y1 then raise Xfinally r2 := false;;

while its expansion is rejected

try if y1 then raise X; r2 := false;with _ -> r2 := false; propagate;;

2.4.4 Parameterized exception names

In Objective Caml, it is possible to declare an exceptions name which takes some argument when it is raised. The type of the argument is given when the exception is defined, this may be for instance an integer corresponding to some error code:

exception Error of int;;

In Flow Caml, such a definition is also possible. However, the type of integers is parameterized by a security level which must in consequence appear in the declaration of the exception. For instance, one may introduce the exception ErrorAlice which is parameterized by an integer of level !alice:

exception ErrorAlice of !alice int;;

and then this exception can be raised with different integers of level less than or equal to !alice as argument:

raise (ErrorAlice 0);;

raise (ErrorAlice x1);;

However, this has two limitations: first, it is not possible to raise the exception ErrorAlice with an argument whose level is, for instance, !bob. Obviously, a workaround may consist in defining another exception name:

raise (ErrorAlice x2);;

This expression generates the following information flow: from !bob to !alicewhich is not legal.

exception ErrorBob of !bob int;;

raise (ErrorBob x2);;

However, this does not seem practical. That is the reason why, in Flow Caml, the type of exceptions arguments may be parameterized by some security level:

exception Error : 'a of 'a int;;

To explain how the security level of the argument given to an exception is traced, let us define the following function:

let error code = raise (Error code);;

val error: 'a int -{'a | Error: 'a |}-> 'b

The security level associated to the exception Error in the row of this function combines two pieces of information: first, the security level of the context where the exception is raised and, that of the integer argument. Merging these two annotations into a single one is relatively ad hoc; however, this allows keeping concise typings, and works well with most common usage of exceptions with arguments. It should be possible to provide a more flexible mechanism for parameterizing types of exceptions arguments, for instance by allowing several security levels as arguments, which will also appear in rows. However, this would increase the complexity of the system, as well as the verbosity of function types.

To conclude this section about exceptions, let us mention that some of the built-in exceptions of Objective Caml are not defined in the Flow Caml library. This is the case for instance of Out_of_memory and Stack_overflow which are respectively raised by the garbage collector when there is insufficient memory to complete the computation and the bytecode interpreter when the evaluation stack reaches its maximal size. Indeed, analyzing them with Flow Caml would be of little sense, because, in absence of sophisticated memory and stack analyzes, one must assume them to be possibly raised at almost every point of the program. That is the reason why they are not provided in Flow Caml library: thus, they cannot be catched by programs and become fatal errors.

Before reading further this tutorial, please reset the security level of the evaluation context in your toplevel by entering:

#reset_context;;

2.5 Defining new types

In Flow Caml, the programmer can introduce types thanks to the type declaration. First and foremost, this allows to define new data structures using records and variants. The mechanism used to define types in Flow Caml is similar to that of Objective Caml. However, type declarations involve additional information, in order to deal with the extra features of the type system related to the security analysis.

2.5.1 Variants

We begin this introduction with some examples of variant datatypes. Let us first define a type whose values are the four cardinal points:

type 'a cardinal = North | West | South | East # 'a;;

type (#'a:level) cardinal = North | West | South | East # 'a

In the Flow Caml type system, information carried by a value of type cardinal (which is one of the four symbolic constants listed in the declaration) is described by one security level, similarly to the built-in enumerated types, such as integers or characters. Indeed, the type constructor cardinal has one argument which is a security level. In the above definition, this argument, 'a, is declared to be the information level related to the sum by the clause # 'a.

The answer produced by the system when a type is defined gives its signature, which is automatically inferred. This gives the kind and the variance of every argument: #'a:level means that 'a is a parameter of kind level, is covariant and must be guarded.

let p0 = North;;

val p0 : 'a cardinal

let p1 : !alice cardinal = North;;

val p1 : !alice cardinal

let p2 = if y2 then North else South;;

val p2 : !bob cardinal

We now define the function rotate, which takes as argument a cardinal point and returns its successor in the clockwise order:

let rotate = function North -> East | West -> North | South -> West | East -> South;;

val rotate: 'a cardinal -> 'a cardinal

As reflected by the inferred type scheme, this function takes a cardinal point of any level as argument and returns another cardinal point of the same level as a result.

In sections 2.1.3 and 2.1.4, we have encountered two examples of variant datatypes, which are predefined in the system. Obviously, there is nothing magic with them and, now, we can give their regular definition:

type ('a, 'b) option = None | Some of 'a # 'b;;

type (+'a:type, #'b:level) option = None | Some of 'a # 'b

The definition lists all the possible forms of a value of type ('a, 'b)option: it is either the constant None or the constructor Some with some argument of type 'a. The fourth line of the declaration, # 'b tells that 'b is the security level attached to the knowledge of the form of the option, i.e. whether it is None or Some. (Let us recall that, in the second case, information carried by Some's argument is reflected by the security levels appearing in the type 'a itself.)

The output produced by the system after the definition of this type gives the signature of the type option: +'a:type means that the first argument is covariant and is a type; while #'b:level means that the second argument is a level, and is covariant and guarded.

The definition of the type list is naturally recursive; but this has no particular consequence and the declaration is therefore similar to the previous one:

type ('a, 'b) list = [] | :: of 'a * ('a, 'b) list # 'b;;

type (+'a:type, #'b:level) list = [] | (::) of 'a * ('a, 'b) list # 'b

Following the same principle, one can also define binary trees as follows:

type ('a, 'b) tree = Leaf | Node of ('a, 'b) tree * 'a * ('a, 'b) tree # 'b;;

type (+'a:type, #'b:level) tree = Leaf | Node of ('a, 'b) tree * 'a * ('a, 'b) tree # 'b

and define the function which computes the height of a tree:

let rec height = function Leaf -> 0 | Node (tl, _, tr) -> max (height tl) (height tr);;

val height: ('a, 'b) tree -> 'b int

As for the function which calculates the length of a list, the inferred type scheme reflects that the height of a tree depends on its structure (i.e. the embedding of constructors Leaf and Node) but not on the values stored inside the nodes.

Equipping a type definition taken from an Objective Caml program in order to obtain a Flow Caml one which includes security annotations, generally requires to make some design choices, related to the security properties one wants to enforce. For instance, let us consider the definition, in Objective Caml, of binary trees whose nodes are labeled by integers:

type int_tree = ILeaf | INode of int_tree * int * int_tree;;

There are at least two relevant ways to ``decorate'' this definition in order to obtain a Flow Caml type. A first solution consists in specializing the definition of the type tree given above:

type ('a, 'b) int_tree = ILeaf | INode of ('a, 'b) int_tree * 'a int * ('a, 'b) int_tree # 'b;;

type (+'a:level, #'b:level) int_tree = ILeaf | INode of ('a, 'b) int_tree * 'a int * ('a, 'b) int_tree # 'b;;

In this case, the type of a integer tree carries two security annotations, 'a and 'b: the former describes information attached to the integers stored in the tree while is related to the structure of the tree. In fact, the type ('a, 'b)int_tree is isomorphic to ('a int, 'b)tree. This allows distinguishing the knowledge of the structure of a tree from that of its labels. To illustrate this point, let us define two functions: size which calculates the number of nodes of a tree and sum which calculates the sum of its labels:

let rec size = function ILeaf -> 0 | INode (tl, x, tr) -> size tl + 1 + size tr;;

val size : ('a, 'b) int_tree -> 'b int

let rec sum = function ILeaf -> 0 | INode (tl, x, tr) -> sum tl + x + sum tr;;

val sum : ('a, 'a) int_tree -> 'a int

The result of the first function depends only on the structure of the tree, and not on its content. As a result its security level is related only to the second annotation of the tree given as argument. On the other hand, the sum of the labels of a tree carries information about the structure of the tree and about its labels, so in the type of sum, the security level of the returned integer must be greater than or equal to the two ones of the tree.

A possible alternative consists in equipping the type of trees labeled by integers with only one security level:

type 'a int_tree1 = ILeaf1 | INode1 of 'a int_tree1 * 'a int * 'a int_tree1 # 'a;;

type (#'a:level) int_tree1 = ILeaf1 | INode1 of 'a int_tree1 * 'a int * 'a int_tree1 # 'a;;

Information carried by the labels is no longer distinguished from the knowledge of the structure of the tree. This choice allows more concise and simpler but less precise typings. For instance, with this definition, both functions computing respectively the size of a tree and the sum of its labels have the same types:

let rec size1 = function ILeaf1 -> 0 | INode1 (tl, x, tr) -> size1 tl + 1 + size1 tr;;

val size1 : 'a int_tree1 -> 'a int

let rec sum1 = function ILeaf1 -> 0 | INode1 (tl, x, tr) -> sum1 tl + x + sum1 tr;;

val sum1 : 'a int_tree1 -> 'a int

The type obtained for size1 gives a less precise description of the behavior of the function w.r.t. information flow than that of size: it does not reflect that the size of a tree does not depend on the value of its labels, as reflected by these computations:

size (INode (ILeaf, x1, ILeaf));;

- : 'a int

size1 (INode1 (ILeaf1, x1, ILeaf1));;

- : !alice int

2.5.2 Records

Records as tuples

Records are an improved form of tuples, with named fields. Here, we declare a record type to represent points or vectors in a two-dimensional space.

type 'a vector = { x: 'a int; y: 'a int };;

type (#'a:level) vector = { x: 'a int; y: 'a int }

As reflected by the output of the toplevel, the type constructor vector has one argument which is the common level of the two integers it is made of. This argument is covariant and guarded. As for tuples, there is no particular security level attached to the record structure itself, since it is not really observable in the language.

So, when a vector is built from two integers, its security level is nothing but the union of those of the given integers:

let v = { x = x1; y = x2 };;

val v : [> !alice, !bob] vector

One can also define some of the classical functions operating on vectors:

let add_vector v1 v2 = { x = v1.x + v2.x; y = v1.y + v2.y };;

val add_vector: 'a vector -> 'a vector -> 'a vector

let rot_vector v = { x = - v.y; y = v.x };;

val rot_vector: 'a vector -> 'a vector

The manner we equip the type vector with security annotations is somehow arbitrary. Indeed, it is also possible to distinguish the information carried by each of its components and hence have two security levels:

type ('a, 'b) vector2 = { x2: 'a int; y2: 'b int };;

type (#'a:level, #'b:level) vector = { x2: 'a int; y2: 'b int }

Such a declaration allows in some cases more precise, but also more verbose, typings.

let add_vector2 v1 v2 = { x2 = v1.x2 + v2.x2; y2 = v1.y2 + v2.y2 };;

val add_vector2: ('a, 'b) vector -> ('a, 'b) vector -> ('a, 'b) vector

let rot_vector2 v = { x2 = - v.y2; y2 = v.x2 };;

val rot_vector2: ('a, 'b) vector2 -> ('b, 'a) vector2

In particular, the type obtained for the function rot_vector2 clearly shows that the function performs some permutation of the two components of the vector.

Mutable records

As in Objective Caml, records can also have mutable fields whose content may be modified in place. They are declared with the mutable keyword:

type ('a, 'b) mvector = { mutable mx: 'a int; mutable my: 'a int } # 'b;;

type (='a:level, #'b:level) mvector = { mutable mx : 'a int; mutable my : 'a int;} # 'b

This defines a type for mutable vectors. This declaration calls for two comments. Firstly, a mutable field is at the same time an input and an output channel of a value: it contents can be written and read. Because it is described by an only type, it must be invariant. Hence, in the above definition, the parameter 'a is invariant, as reflected in the signature by the =. Secondly, a record involving some mutable field is no longer a simple tuple: the information it carries is not entirely contained by its components because its identity (i.e. its address in memory) can be observed in the language. Hence, its type must carry an additional security level which tells how much information is attached to the knowledge of its identity. In our example, this role is played by the argument 'b, which is specified by the clause # 'b at the end of the definition. To illustrate the use of such a datatype, let us define the function rot_mvector which rotates in place a vector:

let rot_mvector v = let x = v.mx in v.mx <- v.my; v.my <- x;;

val rot_mvector : ('a, 'a) mvector -{'a ||}-> unit

Because of their update, the integers of the vector given as argument contain information about both the context where the function is applied and the identity of the record given as argument.

In section 2.3, we have introduced references. However, they are only a particular case of mutable records and can be defined as follows:

type ('a, 'b) ref = { mutable contents: 'a } # 'b;;

type (='a:type, #'b:level) ref = { mutable contents: 'a } # 'b

What is more, the three primitives operations ref, := and ! on references are regular functions which can be implemented from the record representation of references:

let ref x = { contents = x };;

val ref : 'a -> ('a, _) ref

let (:=) r x = r.contents <- x;;

val ( := ) : ('a, 'b) ref -> 'a -{'b ||}-> unit with 'b < level('a)

let ( ! ) r = r.contents;;

val ( ! ) : ('a, 'b) ref -> 'c with 'b < level('c) and 'a < 'c

2.6 Interacting with the outside world

A whole Flow Caml program may be viewed as a process that receives information from one or several sources, performs some computation and sends its results to one or several receivers. Then, the final purpose of the Flow Caml type system is to check that every information flow from a source to a receiver generated by the execution is legal w.r.t. the security policy of the system. In this section, we describe how such external entities are modeled in Flow Caml and how the desired security policy may be specified by the programmer.

In the literature, holders of information are generally referred to as principals (from the program's viewpoint, each of them can be a source, a receiver, or both). Depending on the context, principals may stand for a variety of concepts: (groups of) human beings, security classes (e.g. public or secret), subsets of the system's memory, communication channels through some peripheral or network interface. However Flow Caml is not concerned with the real existence of such entities, and provides a general and uniform manner to deal with them: in its type system, principals are represented by constant security levels. In the beginning of this tutorial, Alice, Bob and Charlie were examples of principals and represented by the security levels !alice, !bob and !charlie, respectively. However, they remained relatively abstract, because we just declared a series of values to have these levels---thanks to some type constraint---but we did not say how a program can really interact with them.

2.6.1 The example of the standard input and output

More concrete examples of external communication channels for a program consist in its standard input and output. Both can be viewed as principals and we therefore decide to represent them by the two security levels !stdin and !stdout, respectively. A program can interact with them using the usual functions of the standard library. For instance, print_int outputs an integer on the standard output:

print_int;;

- : !stdout int -{!stdout ||}-> unit

Because the integer provided as argument is sent to the standard output, its security level must be less than or equal to !stdout. To print the integer 1, one writes:

print_int 1;;

- : unit

The literal constant 1 has type 'aint for every 'a; hence one can instantiate 'a < !stdout and the call to the function is possible. However, printing the integer x1 (which comes from the principal Alice and hence has the security level !alice) is not, in the default security policy, legal:

print_int x1;;

This expression generates the following information flow: from !alice to !stdoutwhich is not legal.

Indeed, this piece of code generates a flow from Alice to the standard output and hence requires the inequality !alice < !stdout. This is not satisfied in the default security policy which is the empty one: it never allows any communication from one principal to another. It can be refined using declarations introduced by the keyword flow:

flow !alice < !stdout;;

This makes the security level !alice less than or equal to !stdout. In other words, this allows information flow from the principal represented by !alice (Alice) to that of !stdout (the standard output). These declarations are naturally ``transitive''. For instance, if one declares:

flow !bob < !alice;;

then Bob is allowed to send information to Alice, but also, by transitivity, to the standard output:

print_int x2;;

- : unit

It is worth noting that the constant security levels are global as well as the declarations that relate them. This is natural because the principals and the security policy they represent are so. However, for convenience, the interactive toplevel allows the programmer to refine the security policy incrementally. This is always safe because a piece of code that is legal in some security policy is still allowed in another one where more information flow is possible.

Similarly, the security level !stdin intends to represent the standard input in the type system. For instance, the function read_line has the following type:

read_line;;

- : unit -{[< !stdout, !stdin] | End_of_file: !stdin |}-> !stdin string

Quoting the documentation of the standard library, read_line ``flushes standard output, then reads characters from standard input until a newline character is encountered [and] returns the string of all characters read, without the newline character at the end''. Thus, invoking read_line affects both the standard input and output, which explains the first annotation in the arrow of its type. Furthermore, if the user sent the ``end-of-file'' sequence (e.g. by typing ^D), the function raises the exception End_of_file, hence the second annotation on the arrow. Lastly, a string obtained by reading on the standard input must have the level !stdin:

let s1 = read_line ();;

val s1 : !stdin stringCurrent evaluation context has level !stdin

Let us note that, if one wants to print on the standard output a string read on the standard input, the security policy of the program must allow information flow from the latter to the former. This is declared by the following statement:

flow !stdin < !stdout;;

and then, it is possible to write a function echo which ``pipes'' the standard input to the standard output:

let echo () = try while true do let s = read_line () in print_string s done with End_of_file -> ();;

val echo : unit -{[< !stdout, !stdin] ||}-> unit

2.6.2 Modeling principals

Real programs are liable to communicate with external entities through other channels than the simple standard input and output, e.g. the file system, network interfaces or display devices. However, the Flow Caml library does not provide functions allowing such communications: analyzing these low-level operations with its type system would not yield any relevant information about their behavior w.r.t. the security policy, because fine-grained considerations are in general mandatory to prove they are safe. Then, the interaction with external entities must be modeled in Flow Caml at a higher level.

That is the reason why a program written and verified with the Flow Caml system must generally be divided in two parts. The purpose of the first one is to provide a high level model of the external principals considered by the program. This should consist in a series of functions for interacting with them, which are implemented in one or several regular Caml modules, using for instance the standard i/o interface, the Unix library or some graphical toolkit. This part of the code cannot be verified by the Flow Caml system: the programmer must supply itself an interface for these ``high level'' functions which specifies their behavior w.r.t. the security policy. The second part consists in the body of the program, which interacts with the outside world only with the model of principals provided by the previous modules. This part can be written and type-checked in Flow Caml, which automates its verification. In section 2.8, we will give more details about this programming scheme: because this requires dividing the program at hand into several compilation units. Because they are roughly a particular case of top-level modules, we first say a few words about the module layer of the Flow Caml language.

2.7 The module language

The broad outline of the Flow Caml module language is the same as that of Objective Caml: it provides structures, which are sequences of definitions, and signatures which are interfaces for structures. Besides functors are ``functions'' from structures to structures, which allow expressing parameterized structures.

Although, there is no major novelty compared with Objective Caml, this section describes how its module language has been extended to handle the type system of Flow Caml and its information flow analysis; and illustrates with some examples its basic usage.

2.7.1 Structures and signatures

A structure consists in an arbitrary sequence of definitions, which are packaged together. It is introduced by the struct ...end construct, and is usually given a name with the module binding. For instance, one may define a structure implementing sets of integers (with binary trees):

module IntSet = struct type 'a t = Empty | Node of 'a t * 'a int * 'a t # 'a let empty = Empty let rec add x = function Empty -> Node (Empty, x, Empty) | Node (l, y, r) -> if x < y then Node (add x l, y, r) else Node (l, y, add x r) let rec mem x = function Empty -> false | Node (l, y, r) -> (x = y) || mem x (if x < y then l else r)end;;

module IntSet : sig type (#'a:level) t = Empty | Node of 'a t * 'a int * 'a t # 'a val empty : 'a t val add : 'a int -> 'a t -> 'a t val mem : 'a int -> 'a t -> 'a boolend

This structure comprises one type definition (the type of sets of integers, t), and three values: the empty set, empty, and two functions operating on sets, add (to add an integer to a set) and mem (to test whether an integer belongs to a set). The system outputs the signature of the structure, which is a list of its components with their declaration. Outside the structure, its components can be referred to using the ``dot notation'', that is, identifiers qualified by a structure name. For instance, IntSet.add refers to the function add of this structure.

IntSet.add x1 (IntSet.add x2 IntSet.empty);;

- : [> !alice, !bob] IntSet.t

Signatures allows to abstract some characteristics of the implementation of a structure by hiding some components or exporting them with a restricted declaration or type. For instance, one may hide the concrete representation of integer sets:

module type INTSET = sig type (#'a:level) t val empty: 'a t val add: 'a int -> 'a t -> 'a t val mem: 'a int -> 'a t -> 'a boolend;;

module AbstractIntSet = (IntSet : INTSET);;

module AbstractIntSet : INTSET

It is worth noting that parameters in declarations of abstract types must be annotated with their kind and variance in signatures: because the representation of the type is not given, they can no longer be inferred by the system. However, they are required to properly type-check the rest of the program.

2.7.2 Functors

Functors are ``functions'' from structures to structures. They are used to express parameterized structures. A common example is a definition of a generic set library, parameterized by a structure giving the type of the elements of the set and a function compare defining a total order between them:

module type ORDERED_TYPE = sig type (#'a:level) t val compare : 'a t -> 'a t -> 'a intend;;

(As usual in Caml, compare xy is expected to return 0 if x is equal to y, a negative integer if x is less than y and a positive integer otherwise.) In this signature, the type of the elements, t, is parameterized by one security level which describes all the information leaked by a comparison (as reflected by the type of compare). However, this does not prevent to instantiate it with more complex data types, which are originally parameterized by several security levels:

module IntList : ORDERED_TYPE = struct type 'a t = ('a int, 'a) list let rec compare l1 l2 = match l1, l2 with [], [] -> 0 | [], _ :: _ -> -1 | _ :: _, [] -> 1 | hd1 :: tl1, hd2 :: tl2 -> let c = Pervasives.compare hd1 hd2 in if c = 0 then compare tl1 tl2 else cend;;

module IntList : sig type (#'a:level) t val compare : 'a t -> 'a t -> 'a intend

We now define the functor implementing sets of arbitrary type. This functor takes the structure Elt as argument which must have the signature ORDERED_TYPE:

module Set (Elt: ORDERED_TYPE) = struct type 'a element = 'a Elt.t type 'a t = Empty | Node of 'a t * 'a element * 'a t # 'a let empty = Empty let rec add x = function Empty -> Node (Empty, x, Empty) | Node (l, y, r) -> if Elt.compare x y < 0 then Node (add x l, y, r) else Node (l, y, add x r) let rec mem x = function Empty -> false | Node (l, y, r) -> let c = Elt.compare x y in (c = 0) || mem x (if c < 0 then l else r)end;;

module Set : functor (Elt : ORDERED_TYPE) -> sig type (#'a:level) element = 'a Elt.t type (#'a:level) t = Empty | Node of 'a t * 'a Elt.t * 'a t # 'a val empty : 'a t val add : 'a Elt.t -> 'a t -> 'a t val mem : 'a Elt.t -> 'a t -> 'a boolend

As in the IntSet example, it would be good style to hide the actual implementation of the type of sets. This can be achieved by restricting Set by a suitable functor signature. Firstly, let us define the type of a module implementing a set structure:

module type SET = sig type (#'a:level) element type (#'a:level) t val empty: 'a t val add: 'a element -> 'a t -> 'a t val mem: 'a element -> 'a t -> 'a boolend;;

This signature lists the type of elements and sets as well as the functions operating on them. Then, the Set functor takes a structure of signature ORDERED_TYPE and returns one of signature SET, so it may be declared with the following type:

module Set (Elt: ORDERED_TYPE) : (SET with type 'a element = 'a Elt.t) = struct ...end

The type constraint with type 'a element = 'a Elt.t has the same purpose as in Objective Caml: it points out the fact that the sets contain elements of type Elt.t, i.e. that the functions add and mem can be applied with arguments of this type. To conclude with this example, one can retrieve our first implementation of integer sets, the module IntSet, as an instance of the functor Set:

module IntSet' = Set (struct type 'a t = 'a int let compare = Pervasives.compareend);;

module IntSet' : sig type 'a element = 'a int type 'a t val empty : 'a t val add : 'a element -> 'a t -> 'a t val mem : 'a element -> 'a t -> 'a boolend

In the previous examples, the interaction between the module language and the security analysis featured by the type system of Flow Caml remains relatively basic: roughly speaking, it simply consists in including the relevant security annotations in value and type declarations in signatures. However, it is sometimes necessary to parameterize a signature with some security level. For instance, one may define a module type for a structure implementing some input channel:

module type IN = sig level Data level Prompt val read : unit -{[< Prompt] ||}-> Data stringend;;

This signature involves two abstract levels: Data is the security level of data read on the input channel; and Prompt represents the information leaked on the channel when one starts listening on it. At the time being, nothing is known about these levels, so they remain ``abstract''. The function read is intended to read one line on the underlying channel. It naturally produces a string whose level is Data (let us remark that, in this model, reading can never fail). An implementation of this signature using the standard input would be as follows:

module Stdin = struct level Data = !stdin level Prompt less than !stdin, !stdout let read () = try read_line () with End_of_file -> ``end;;

module Stdin : sig level Data = !stdin level Prompt less than !stdout, !stdin val read : unit -{[< !stdout, !stdin] ||}-> !stdin stringend

Strings read on the standard input have level !stdin, so Data is declared to equal to it in Stdin. Invoking read_line affects the standard input and the standard output (because it is flushed), so Prompt must be less than or equal to !stdin and !stdout. Then, the module Stdin implements the signature IN, which may be immediately verified by a type constraint:

module AbstractStdin = (Stdin : IN);;

module AbstractStdin : IN

Similarly, we declare the module type for output channels, and implement an instance of it devoted to the standard output:

module type OUT = sig level Data val print : Data string -{Data ||}-> unitend;;

In this case, we only need one security level Data which represents the information which may be sent on the channel. (We do not consider the possibility of receiving information from an output channel, for instance because of a buffer overflow.) The module Stdout implements this signature for the standard output:

module Stdout = struct level Data = !stdout let print = print_endlineend;;

module Stdout : sig level Data = !stdout val print : !stdout string -{!stdout ||}-> unitend;;

Now, we aim at writing a functor which takes as argument two structures, defining an input channel and an output channel, respectively. The body of the functor will define a function copy whose purpose is simply to read one line on the input channel and print it on the output channel. However, it is not enough to require the two structures parameterizing the functor to have the respective signature INPUT and OUTPUT: indeed, the function copy implemented by the functor generates an information flow from the channel represented by the former to that of the latter. Hence the security level Data of the input channel, must be declared to be less than or equal to that of the output channel.

module Copier (I : IN) (O : OUT with level Data greater than I.Data) = struct let copy () = O.print (I.read ())end;;

module Copier : functor (I : IN) -> functor (O : sig level Data greater than I.Data val print : Data string -{Data ||}-> unit end) -> sig val copy : unit -{[< O.Data, I.Prompt] ||}-> unit end

That is the purpose of the constraint withlevel appearing in the type of the second argument of the function. Its semantics is similar to that of withtype or withmodule in Objective Caml: it refines the definition of the level Data in the signature of the module O. The clause greater than I.Data declares that this security level must be that of a principal allowed to ``receive'' information from the channel implemented by the structure I whose level is I.Data.

It is worth noting that the order in which parameters appear in the functor definition generates some asymmetry, because the constraint is applied on the second structure. Obviously, it is also possible to permute the two arguments:

module Copier' (O : OUT) (I : IN with level Data less than O.Data) = struct let copy () = O.print (I.read ())end;;

module Copier' : functor (O : OUT) -> functor (I : sig level Data less than O.Data level Prompt val read : unit -{Prompt ||}-> Data string end) -> sig val copy : unit -{[< I.Prompt, O.Data] ||}-> unit end

The constraint now states that the principal represented by the level source in I must be allowed to send information to O.Data.

To conclude with this example, let us build an instance of Copier dedicated to the standard input and output. This requires to allow information flow from the former to the latter, which can be done by the toplevel declaration:

flow !stdin < !stdout;;

Now, the security level Stdin.Data is less than or equal to Stdout.Data, so Stdin and Stdout are legal arguments for Copier:

module StdCopier = Copier (Stdin) (Stdout);;

module StdCopier : sig val copy : unit -{[< Stdout.Data, Stdin.Prompt] ||}-> unitend

2.7.3 Side-effects, exceptions and the module language

Let us briefly explain how the type system of Flow Caml traces information flow due to side-effects and exceptions throughout the evaluation of module expressions. Evaluating a structure consists in considering successively each of its definitions; computation really arises only at let definitions and their evaluation may have side-effects or raise exceptions. It is worth noting that an exception which escapes the scope of a top-level let definition cannot be trapped further, so it terminates the program.

In fact, Flow Caml's type system associates not only a type to the body of a let definition but also two lists of security levels, or bounds written from ... to... where each ... stands for a list of security levels. The lower bound (appearing after the from keyword) describes the side-effects performed by the evaluation of the definition, roughly speaking it includes the security level(s) of data structures the definition may affect. The upper bound (appearing after the to keyword) tells how much information is attached to the exceptions the definition may raise. This process is generalized to the whole module language by associating to every definition and module expression a pair of bounds. Because they have no computational content, the bounds of external, type, level, exception, moduletype, open and include definitions are always empty. The bounds of a module definition are obtained by considering recursively the module expression which appears in the right-hand-side.

For instance, the evaluation of a structure struct def₁ ... def_nend consists in evaluating successively each of the definitions, then the bounds associated to the whole module expression are naturally obtained by merging those of the definitions def₁ to def_n. Moreover, while considering this sequence of definitions, def_i is evaluated if and only if none of def₁ to def_i-1 raised an exception. As a result, to prevent any illegal information flow, the upper bounds of the former must be less than or equal to the lower bound of the latter.

However, the toplevel system does not output the bounds of the definitions the user enters. Instead, at every prompt, it considers all definitions entered so far as members of a single structure, whose upper bound corresponds to the evaluation context's security level introduced in section 2.4.2. When one enters a new definition, the system checks that its lower bound is less than or equal to that of the current evaluation context's security level, and then it is extended with the upper bound of the new definition. A message giving the new evaluation context's security level is output in the case it is different from the previous one.

A functor definition does not perform any computation, so it has empty bounds. Indeed, the body of the functor is executed only when the functor is applied. In order to trace bounds from functors definitions to functors applications, the arrow symbols of functors types are annotated---when necessary---by bounds:

module type S = sig val x : 'a intend;;module F (X: S) = struct let _ = r1 := X.x; if X.x = x2 then raise Exitend;;

module F : functor (X : S) -{!alice | !bob}-> sig end

This module type means that any application of the functor F may generate a side-effect on cells of level !alice and may raise an exception at level !bob. Then, an application of F inserts !bob in the evaluation context's security level:

module F0 = F (struct let x = 1 end);;

Current evaluation context has level !bob

Moreover, it is only possible to instantiate F if the evaluation context's security level is less than or equal to !alice, which is no longer the case after a first application of F:

module F1 = F (struct let x = 1 end);;

This expression is executed in a context of level !bobbut has an effect at level !alice.This yields the following information flow: from !bob to !alicewhich is not legal.

2.8 Standalone programs

All examples given so far were entered under the interactive system. However, Flow Caml code can also be written in separate files: the batch ``compiler'' flowcamlc allows to type-check them, and also translates them into regular Objective Caml source code files, so that they can be compiled using the compilers ocamlc or ocamlopt, yielding a standard executable.

In this section, we aim at explaining how it is possible to write such programs in Flow Caml. To illustrate our discussion, we consider as example the complete program whose source code is given here. Let us briefly introduce the operation this example program performs: on Unix systems with shadow passwords, information about user accounts is stored in two files. The file /etc/passwd registers the list of logins, with, for each of them, a password and some administrative information such as a numeric id, the user's home directory and shell. Besides, the file /etc/shadow associates to every login a password stored in a encrypted form (with some optional aging information), which is used in place of that in /etc/passwd. Our program aims at synchronizing these two files, i.e. generating an entry in /etc/shadow for every account which is listed only in /etc/passwd. In the forthcoming subsections, we will explain step by step how the source code is organized, how it is verified and compiled thanks to the Flow Caml system. The type system will allow us to check that running the program cannot reveal to the user which invokes the command any information about the passwords stored in the two files.

2.8.1 Compilation units and batch compilation

The source code of a program is generally split into several files, called compilation units, that can be compiled separately. In Flow Caml, a compilation unit A comprises one or two files, among:

the implementation file a.fml, which contains a sequence of definitions, analogous to the inside of a struct...end construct;
the interface file a.fmli, which contains a sequence of specifications, analogous to the inside of a sig...end construct.

(In addition to definitions and specifications, the implementation and the interface may include two particular ``headers'', made of a flow declaration and optional affects and raises statements, whose respective purposes will be explained in the sections 2.8.2 and 2.8.3.) Both files define a structure named A (same name as the base name of the two files, with the first letter capitalized), as if the following definition was entered at top-level:

module A : sig (* specifications of file a.fmli *) end = struct (* definitions of file a.fml *) end;;

The files defining a set of compilation units can be handled separately using flowcamlc, following for each unit A one of the three above schemes:

The compilation procedure of a unit A defined in files a.fmli and a.fml is described in figure 2.1. First, the Flow Caml interface a.fmli is fed to flowcamlc, which checks its well-formedness and produces a compiled version of it, a.fcmi. It also translates the interface file into a regular Objective Caml one, namely a.mli. Second, the implementation a.fml can be type-checked by flowcamlc. The compiler computes the most general interface for the implementation, and checks it fulfills the declared one (i.e. that stored in a.fcmi). Furthermore, the source code of the unit in a.fml is translated into a Objective Caml implementation file, a.ml. Then, a.mli and a.ml can be compiled with ocamlc to produce a compiled interface a.cmi and a bytecode object file a.cmo.

Figure 2.1: Compilation scheme of a unit defined in a.fmli and a.fml
However, as in Objective Caml, it is possible to build a unit A by providing only an implementation file a.fml but no interface file. This yields the compilation scheme of figure 2.2: the implementation a.fml can be directly passed through flowcamlc and the interface computed by type inference is stored itself in a.fcmi.

Figure 2.2: Compilation scheme of a unit defined in a.fml
Lastly, as we have explained in section 2.6.2, some pieces of code cannot be satisfyingly typed with Flow Caml's system. Such definitions may be provided by a compilation unit with a Flow Caml interface a.fmli but only an Objective Caml implementation a.ml, as illustrated by the compilation scheme of figure 2.3. Then, the system will not check that the code in a.ml fulfills the interface a.fmli w.r.t. the security policy---this is left to the programmer's responsibility---but the definitions exported by the unit (and registered in a.fcmi) will be available to the rest of the program.

Figure 2.3: Compilation scheme of a unit defined in a.fmli and a.ml

Our example program is made of four compilation units. Passwd and Shadow are low-level modules which implement functions for accessing the /etc/passwd and /etc/shadows files: their implementations are directly written in Objective Caml (files passwd.ml and shadow.ml), and only interfaces are provided in Flow Caml (files passwd.fmli and shadow.fmli). These interfaces assign security levels to the information manipulated by the units: data stored in /etc/passwd has the level !passwd_file, except the passwords, which have level !password. Similarly, information from /etc/shadow receives the level !shadow_file and !shadow_password. The unit Verbose provides a verbose mode: if the user runs the program with the -v option, then the execution is traced on the standard output. The body of the program is in Main. These last two units are fully implemented in Flow Caml: implementation (verbose.fml and main.fml) and interface (verbose.fmli and main.fmli) files are provided for each of them.

2.8.2 `flow` declarations in implementations and interfaces

In section 2.6.1, we explained how flow declarations allow specifying the security policy by setting inequalities between principals. We have seen that the toplevel system allows the programmer to refine the security policy incrementally, by entering new flow declarations which remain valid until the end of the interactive session.

In programs written for the batch compiler flowcamlc, every compilation unit must come with its own security policy, i.e. a flow declaration which specifies sufficient assumptions on principals for its source code to be well-typed. This declaration must be provided at the beginning of the implementation and interface files. For instance, the compilation unit Verbose of our example begins with the following declaration:

flow !arg < !stderr, !stdout

By convention, the principals !stderr and !stdout represent the standard error and the standard output of the program, respectively. !arg is the security level of the command-line arguments. The declaration is a shorthand for

flow !arg < !stderrand !arg < !stdout

It means that the unit is well-typed under every security policy which enforces at least the two inequalities !arg < !stdout and !arg < !stderr. When a compilation unit includes no flow declaration---as Passwd and Shadow in the example---this simply means it is well-typed in every security policy.

The security policy under which a program made of several compilation units can be considered to be verified is the union of those declared in the units, i.e. the least one which satisfies the assumptions made by every unit. This is safe because if a piece of code is well-typed in a given policy, it remains so under a refinement of it, i.e. a lattice which declares more information flow. However, the possibility to provide a different flow declaration in every compilation unit of a program is of main importance for modularity of programming and re-usability of code, in the context of separate compilation. Indeed, this allows for instance having libraries (such as the standard one) used in programs which have different security policies. Otherwise, one would have to write or compile a specialized version of these libraries for each program which expects a different policy.

The Flow Caml system provides a tool, flowcamlpol, to compute the (minimal) security policy under which a program is (checked to be) safe. The usage of flowcamlpol is---to some extent---similar to that of a linker of object files: it expects as argument the name of the compiled interfaces of the program's units, in the same order as the corresponding object files will be linked. For our example, one must run the command:

flowcamlpol passwd.fcmi shadow.fcmi debug.fcmi main.fcmi

which leads the system to sum up all information flow the example program is liable to perform:

!shadow_file < !shadow_password!shadow_file < !stdout!passwd_file < !shadow_file!password < !shadow_password!arg < !stderr!arg < !stdout

The graphical representation can be obtained if the command is run with the -graph option. It shows some interesting properties of the information flow graph of the program, which have been established automatically by the type system. For instance, the standard output is not related to sensitive data stored in the /etc/passwd and /etc/shadow files, i.e. the user passwords.

Lastly, you may wonder why the minimal assumptions necessary for a compilation unit are not inferred by the system while it type-checks the source code. Indeed, doing so is possible for expressions of the core language. For instance, the following piece of code

print_string Sys.argv.(0);;

is well-typed if and only if the inequality !arg < !stdout is enforced. However, the existence of a ``minimal'' solution to this problem is no longer ensured when considering the module language. Indeed, typing module expressions requires comparing type schemes, i.e. verifying that a given scheme is an instance of another one, which cannot yield principal flow declarations. For instance, the comparison of [< !alice, !bob]int with !charlieint requires the least upper bound of the levels !alice and !bob to be less than or equal to !charlie, which is not expressible in a flow statement.

2.8.3 `affects` and `raises` statements in interfaces

The execution of a program made of one or several compilation units consists in evaluating successively each top-level structure, in the order specified at linking. For instance, our example program is linked with the following command:

ocamlc -o passwd2shadow passwd.cmo shadow.cmo verbose.cmo main.cmo

then, when running the compiled executable passwd2shadow, the definitions of Passwd, Shadow, Verbose and Main are successively evaluated, until an uncaught exception is raised or the end is reached. Then, when gaining control, each unit acquires the information that the previous ones normally terminated. So, in order to trace this possible information flow, one has to consider the bounds of the underlying structures, as if the program where defined in a single unit such as:

module Passwd = struct ...endmodule Shadow = struct ...endmodule Verbose = struct ...endmodule Main = struct ...end

For this purpose, every interface file can mention the lower and upper bounds of the underlying structure thanks to the affects and raises statements, respectively , which appears between the flow declaration and the signature (as usual, omitted bounds are supposed to be empty). For instance, the interface of Verbose declares the following bounds

affects !argraises !arg

which means that the unit Verbose has side-effect of level !arg and can raise an exception at this level. This statement appears only in the interface: when type-checking the implementation, the bounds are inferred from the source code and compared to those provided in the interface.

Then, when linking a series of compilation units Unit₁, ..., Unit_n to produce an executable, one has to check that, for every i, the upper bounds of Unit₁ to Unit_i-1 are less than or equal to the lower bound of Unit_i, which is in fact achieved by the flowcamlpol command, at the same time it computes the security policy:

flowcamlpol passwd.fcmi shadow.fcmi debug.fcmi main.fcmi

Chapter 2 A Tutorial

2.1 Security levels and data structures

2.1.1 Simple types

2.1.2 Strings

2.1.3 Lists

2.1.4 Options

2.1.5 Tuples

2.2 Constrained type schemes

2.2.1 Subtyping

Subtyping between security levels

Subtyping between types

Simplification of type schemes

2.2.2 level constraints

2.2.3 content constraints

2.2.4 Same-skeleton constraints

2.2.5 Functions as values

2.2.6 Interlude: the graphical output of type schemes

2.3 Imperative features

2.3.1 Direct and indirect information flows

2.3.2 References

2.3.3 Arrays, strings and loops

2.4 Dealing with exceptions

2.4.1 Rows

2.4.2 Exceptions and side-effects

2.4.3 The special constructs: try ... finally and try ... propagate

2.4.4 Parameterized exception names

2.5 Defining new types

2.5.1 Variants

2.5.2 Records

Records as tuples

Mutable records

2.6 Interacting with the outside world

2.6.1 The example of the standard input and output

2.6.2 Modeling principals

2.7 The module language

2.7.1 Structures and signatures

2.7.2 Functors

2.7.3 Side-effects, exceptions and the module language

2.8 Standalone programs

2.8.1 Compilation units and batch compilation

2.8.2 flow declarations in implementations and interfaces

2.8.3 affects and raises statements in interfaces

2.2.2 `level` constraints

2.2.3 `content` constraints

2.4.3 The special constructs: `try ... finally` and `try ... propagate`

2.8.2 `flow` declarations in implementations and interfaces

2.8.3 `affects` and `raises` statements in interfaces