Re: Enquiring minds want to know

From: Robert Harley (Robert.Harley@inria.fr)
Date: Wed Mar 01 2000 - 17:06:24 MET

  • Next message: Damien Doligez: "New MacOS client"

    Vincent Goffin (vjg@research.att.com) wrote:
    >Why would a long running program generate more interesting
    >points than one that just started from a random seed?

    Well, we are all computing billions of arbitrary points by iterating
    from one to the next and eventually there will be a collision when two
    people find the same one. There are far too many points to collect
    them all so only distinguished points are collected. The collision
    will happen at an arbitrary point but will be detected (shortly
    afterwards) at a distinguished point.

    When you compute some points, those points can't possibly participate
    in a collision until you find a distinguished point downstream of
    them. Naturally, the value of a distinguished point depends on how
    many arbitrary points are upstream of it. The more there are, the
    more likely it is to participate in a collision.

    When you compute some arbitrary points, they stay latent in the saved
    state until you find a distinguished point downstream of them. If you
    start two chains of iterations, one from a random point and one from a
    point taken from a saved state, then they are equally likely to find
    a distinguished point. But the latter chain will find a more valuable
    distinguished point since a bunch of latent points are known to be
    upstream of it, whereas for the random starting point it is highly
    likely that very few points are upstream of it at all.

    This whole picture is exacerbated by the fact that 32 chains of
    iterations are running in parallel. Suppose you start afresh and do
    320 million iterations and then find a first distinguished point.
    That point has only 10 million points upstream of it. The other 310
    million iterations are all still latent in the saved state!

    Doing the parallel chains speeds up the program a lot so it is
    worthwhile, but it means the saved state is quite valuable. If the
    number of parallel chains is PARAL, then the value of the saved state
    rises from nil to about PARAL/2 distinguished points.

    A trade-off is required between the gain in speed and the risk of
    losing latent work. One way to lose it is by accidentally deleting a
    long-running saved state. But the main way is this: when the solution
    is found, we will all have some latent work in our saved states which
    didn't contribute to the solution! Choosing PARAL to be something
    like 20 or 30 seems to be a good trade-off in practice.

    >The code also converts the points it generates to a special basis
    >before counting the 0's and 1's. That looks like an expensive step.
    >If it is expensive, why is it needed?

    Yes, it is expensive.

    In fact the set of g = 324518553658426701487448656461467
    elliptic-curve points is partitioned into equivalence classes of 218
    points each (and one containing the point at infinity). The
    pseudo-random iteration is chosen so that when applied to two
    equivalent points, it gives two new points which are also equivalent.
    Furthermore, the distinguishing characteristic is chosen so that when
    a point is distinguished, all equivalent points are distinguished too.
    (These equivalence classes are called "clusters" in the FAQ).

    This scheme allows the search space to be reduced by a factor of 218
    which reduces the number of iterations by a factor sqrt(218) ~= 15.
    But it also imposes draconian conditions on the pseudo-random
    iteration and the distinguishing characteristic which slow down the
    iterations.

    For many elliptic curves one can design an algorithm using equivalence
    classes: points are considered equivalent if you can get from one to
    the other by multiplying by a root of unity of small order. But this
    is usually not useful as the iterations get slowed down way too much.

    For this challenge Certicom chose a curve with a special form: all the
    coefficients in the equation of the curve are 0 or 1. In such a case
    one can use the Frobenius, a fancy name for the function that squares
    the x and y coordinates of a point. The Frobenius applied to a point
    on the curve gives another point on the curve, because in
    characteristic 2 it is true that (a+b)^2 = a^2 + b^2.

    Applying the Frobenius is very fast. And it is the same as
    multiplying by (1-sqrt(-7))/2 which is 138423345589698157369693034392981
    modulo the group order g. Crucially, this a 109-th root of unity!

    Another root of unity that can always be used is -1. Together these
    two give equivalence classes of 2*109 = 218 points. With a
    well-designed algorithm, the time per iteration increases by only 50%
    or so compared to not using equivalence classes. Since the number of
    iterations is 15 times smaller, this leads to a 10-fold gain overall!

    Historical note:
      When Certicom designed the challenge they didn't know about this
      method and their estimates of difficulty were way off. The first
      working version I know of was described in email from a guy called Ari
      Singer at Pitney Bowes in February '98, just using -1. I generalised
      it to use any roots of unity and implemented it. It was first used to
      solve ECC2K-95.

    Bye,
      Rob.



    This archive was generated by hypermail 2b29 : Wed Mar 01 2000 - 19:48:07 MET