BT Labs ECCp-97 32-bit Unix Clients
-----------------------------------
This README.src relates to Unix 32-bit clients Version 0.94 and later.
Distribution
------------
You should have the following files:
README.src this file
README.bin the README with the binary distributions
LICENCE conditions of use for this software
timerp97a.c timer and initial setup application source
eccp97a.c the elliptic curve client source
prodmod97.s GNU assembler code for x86 productMod, squareMod & decode
The eccp97 source currently consists of two C files, eccp97a.c and
timerp97a.c, and an assembler file, prodmod97.s. eccp97a.c is the main client,
whilst timer97p.c is a variant which does two jobs. It creates the ini file
needed by the main client, and also runs the elliptic curve arithmetic
for a known short time to report expected client speed.
On Linux x86 prodmod97.s contains assembler code for productMod,
squareMod and decode, for a significant speedup.
Source Code
-----------
There are functions to perform the basic maths, support functions to
do timing and result submission, and the main iteration loop which
does all the work.
The following functions do basic mathematical operations:
equal - returns true if two 97-bit numbers are equal
addMod - add two 97-bit numbers modulo a given modulus
subMod - subtract two 97-bit numbers modulo a given modulus
doubleMod - double a 97-bit number modulo a given modulus
productMod - multiply two Montgomery residues modulo 2^97
squareMod - square a Montgomery residue modulo 2^97
inverseMod - calculate the inverse of a Montgomery residue modulo 2^97
encode - convert a number modulo the curve modulus to Montgomery
residue form
decode - convert a Montgomery residue to normal form
These make use of several more elementary operations coded as #defines
and a 32-bit x 32-bit -> 64-bit routine, pmul, which performs the
most elementary multiply operation required.
The following functions perform derived mathematical operations:
ellipticSum - perform the elliptic curve add operation on two points
ellipticDouble - elliptic curve add a point to itself
ellipticProduct - elliptic curve multiply a point by a positive integer.
The following functions in eccp97a.c perform utility operations:
term_sig - called when a SIGTERM signal is received
load_ini - load setup information from eccp97a.ini
sendit - sends and logs the result message
gettime - get process time used to date
The following functions in timerp97a.c perform utility operations:
load_ini - load setup information from eccp97a.ini
ask_ini - do the user dialogue for setup & write the ini file
gettime - get process time used to date
Arithmetic is done using big numbers in Montgomery residue form [1].
This makes modular multiplies much quicker as the modularisation
operation on the result involves only multiplication by a pre-computed
inverse.
A fundamental operation which is really slow is inverseMod. This
is in general at least ten times slower than productMod. inverseMod is
called once each in ellipticAdd and ellipticDouble. However the main
iteration loop mitigates the speed penalty of inverseMod by a generalisation
of the simple fact that a/x and b/y can be calculated by computing
t = 1/(x*y) so a/x = a*t*y and b/y = b*t*x. Thus two inversions and two
products have been traded for one inversion and five products. In general
n inversions can be replaced by one inversion and 3n-3 products.
The key routines that affect the speed are productMod and squareMod.
The chosen parallelisation factor of 32 makes inverseMod of secondary
concern. However, on some architectures it may give a marginal improvement
to define the symbol INVTAB (-DINVTAB). This adds a table lookup
method to compute shift values in inverseMod. This is of no use on
Pentiums but does work on Sparcs. YMMV.
The speedups come in two main ways. First, a slow C 32x32->64
multiply routine is replaced by a faster assembler insert which
computes the same function using native instructions not fully available
to C programmers. The next major speedup comes by replacing
productMod and squareMod (and decode, which is a no-brainer once
the first two have been written) by tight assembler routines.
Have *you* got any better ideas??
Compile with something like
gcc -O6 -o eccp97a -DCLIENT=\"yourclientname\" [-DINVTAB] eccp97a.c
On Linux x86 use
gcc -O6 -o eccp97a -DASMPRSQ eccp97a.c prodmod97.s
yourclientname should be something descriptive - LNX86PC means
Linux x86 (LNX86) using Pentium gcc compiler (P) and C productmod (C). When we
add the assembler productMod etc this will change to LNX86PA.
Solaris 2.5 - SOL25GC - gcc & C product
Sunos 4.1.x - SU41GC
You get the idea. Tell us what you use.
John Sager
Security Research Group
BT Laboratories
February 1998
jcs@zoo.bt.co.uk
[1] P. L. Montgomery, Modular multiplication without trial division,
Math. Comp. 44 (1985), 519-521.