The mirage of statistical methods

Statistical methods are a useful refinement of finite-state methods. They certainly should be employed to relativize introspection based methodologies. They should be applied by training on actual intended corpus. But the pitfall is the difficulty of building wide-coverage tree-banks, on one hand because they are costly to set up and to evaluate (hours of perusal by professional linguists), on the other hand because it is not understood how to make them independent from the parsing engines which will use them for training.

Whatever use statistics is made of, it will not by itself discover meaning of natural language utterances, in the same way that algebraico-logical methods will not succeed by themselves, as 50 years of very slow progress has shown.

The parabol of the blind and the paralytic.

© Gérard Huet 2006 Top | MPRI fr | MPRI en | Previous | Next |