Bill Daly on Fri, 09 Apr 1999 17:39:43 -0400

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 Re: ECM code in PARI

```Gerhard Niklasch wrote:
>...
>> Since in general an addition chain is slightly shorter
>> than a Lucas chain, it would make sense to switch to an addition chain
>> approach.
>
>Yep, but do we want to go to great lengths to either store precomputed
>addition chains for thousands of multipliers, or re-establish them on
>the fly?  PRAC already does rather better than the old binary shift
>algorithm, and as far as I can see there are at most a few % to be
>gained (and only for the B1 phase).

I have done a bit of testing to verify that my proposed modified
addition chain (MAC) algorithm is on average better than PRAC for ECM
purposes. Comparing the performance for all primes less than 1000, there
are no cases where PRAC is better than MAC, and on average MAC is a
little over 9% better than PRAC (1859 total elladd's and elldouble's for
MAC, 2045 total for PRAC). There is one case, p = 769, where MAC is 4
steps better than PRAC, and many others where it is 1 to 3 steps better.
(I can send you the results and the PARI/GP code I used, but it is a bit
too long for a pari-dev message.)

MAC does not require precomputing an addition chain. It is essentially
the same as the binary shift algorithm, with the following exception. If
the next part of the bit string to be processed is of the form 1^a 0 1^b
0 1^c ... 0 [0 | 10], where each of a, b, c etc. is greater than 1, then
MAC does an extra doubling step and subtracts once for each embedded 0
in the string, then subtracts once again at the end.

>> Note that there is no advantage in this case to using ellmult to
>> multiply a point by a prime. For addition chains, as opposed to Lucas
>> chains, factorization is not all that helpful. It would make more sense
>> to multiply by the product of several small primes.
>
>But we're multiplying by something which _arises_ as a product of powers
>of primes - we're not factoring the multipliers.

My point was that in certain cases, it is better to use the binary
method on a composite number than to concatenate chains for its prime
factors. For example, the binary method for 129 = 3*43 is 2 steps better
than the sum of the best chains for 3 and 43.

Still, it's true that it is difficult to see a useful pattern. I
compared MAC(3*p) against MAC(p) + MAC(3) for all primes less than 1000,
and there isn't much difference on average. On the other hand, MAC(7*p)
is about 2% better than MAC(p) + MAC(7) on average, so maybe 7 should be
combined with some other prime when possible. I would imagine that the
primes q for which MAC(q*p) is better than MAC(p) + MAC(q) are the ones
which have relatively poor addition chains. For q = 17, which has a
fairly good addition chain, MAC(17*p) is worse by about 2.5%. I'll keep
looking into this.

>> 2. Avoiding invmod
>
>I don't think this is much of an issue, since the current code calls
>invmod() very rarely indeed.  When N is at all large, we're working on
>64 curves simultaneously, and one invmod() will handle up to 128 (!)
>individual additions.  During the B2 phase, only the giant steps involve
>calls to invmod().  The running time is entirely dominated by multipli-
>cations mod N.

Well, you went to the trouble of coding elladd2() to avoid a single
invmod, so I thought you might be interested in other ways to avoid
invmod.

Regards,

Bill
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom