Karim Belabas on Fri, 24 Aug 2012 15:57:44 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: How to do fast integer division |
* Dirk Laurie [2012-08-24 13:31]: > I have an application that does lots of integer divisions with > large operands but small quotients. As a model for that, > I handcoded the extended Euclidean algorithm as follows: > > {mbezout(a,b)= my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1); > while(X[j,3]>0, X[i,] -= truncate(X[i,3]/X[j,3])*X[j,]; > i=j; j=3-i); > X[i,] > } > > This code is pretty standard Even implemented properly (see below), this would be quadratic, whereas PARI's bezout is softly-linear. So the built-in function is bound to be faster. >, so I was astounded to see how slow it is. Timing on 10000-digit numbers: > > ? b1=bezout(a1,a2); > time = 16 ms. > > ? b3=mybezout(a1,a2); > time = 38,622 ms. > > That's a factor of over 2000 slower, not to be explained > by overhead. > > Suspecting that "truncate(X[i,3]/X[j,3])" hides a multitude > of sins (starting with reduction to lowest terms) Indeed, reduction to lowest terms induces an extra gcd at each iteration. For n-bit numbers the classical gcd algorithm runs in time O(n^2), because of a "miracle" with a telescoping series ( the naive cost analysis yields O(n) loops * O(n^2) divisions... ). Your variant does not benefit from this and should run in time O(n^3); it achieves slightly better than cubic time because our integer divisions are not quadratic :-) > I changed it to "truncate(X[i,3]*1./X[j,3])" and the time dropped down > to 200ms. There is a tiny chance that this is not exact, so I did > "divrem(X[i,3],X[j,3])[1]" instead and got 221ms. > > Is that the closest one can get to a fast integer quotient in Pari/GP? X[i,3] \ X[j,3] would be better. In principle, an even better (still quadratic) implementation would be BEZOUT(A, B) = { my (a = A, b = B); while(b, my (q, r, v = divrem(a,b)); q = v[1]; a = b; b = v[2]; r = u; u = v; v = u - q*v; ); [u, (a - A*u) \ B] } It would save - half the matrix updates in exchange for the final (a - A*u) \ B - one further multiplication in each iteration; since a - q*b = r, which has just been computed as part of the Euclidean division. In practice, this is not much faster than the corrected version mbezout(a,b)= { my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1); while(X[j,3]>0, X[i,] -= (X[i,3] \ X[j,3]) *X[j,]; i=j; j=3-i); X[i,] } Cheers, K.B. P.S.: In the 'testing' branch, one can simultaneously assign to more than one variable using the [v1, v2, ..., vn] = 'some_vector of length >= n' construct; as in [q, r] = divrem(a, b) or GCD(a, b) = while(b, [a, b] = [b, a % b]); a; This alllows to simplify a little the BEZOUT implementation. -- Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17 Universite Bordeaux 1 Fax: (+33) (0)5 40 00 69 50 351, cours de la Liberation http://www.math.u-bordeaux1.fr/~belabas/ F-33405 Talence (France) http://pari.math.u-bordeaux1.fr/ [PARI/GP] `