Jérôme Raulin on Wed, 10 Jan 2018 21:01:32 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

sqrtnint algorithm improvement for huge arguments (plus memory leak issue)


Dear all,

sqrtnint(n, k) seems to have "strange" behavior for huge arguments.

Reading GPRC: /home/jerome/.gprc ...Done.

                             GP/PARI CALCULATOR Version 2.10.0 (development
21652-dae10e0)
                              amd64 running linux (x86-64/GMP-6.1.2 kernel)
64-bit version
                         compiled: Jan  9 2018, gcc version 7.2.0 (Ubuntu
7.2.0-1ubuntu1~16.04)
                                               threading engine: pthread
                                     (readline v6.3 enabled, extended help
enabled)

                                         Copyright (C) 2000-2017 The PARI
Group

PARI/GP is free software, covered by the GNU General Public License, and
comes WITHOUT ANY WARRANTY WHATSOEVER.

Type ? for help, \q to quit.
Type ?15 for how to get moral (and possibly technical) support.

parisizemax = 1000001536, primelimit = 500000, nbthreads = 8
(20:23) gp > a=10^1000000; \\ 1 million decimal digits !
time = 27 ms.
(20:24) gp > sqrtint(a);
time = 39 ms.
(20:24) gp > sqrtnint(a,12);
time = 208 ms.
(20:24) gp > sqrtnint(a,123);
time = 127 ms.
(20:24) gp > sqrtnint(a,1234);
time = 172 ms.
(20:24) gp > sqrtnint(a,12345);
  *** sqrtnint: Warning: increasing stack size to 80000000.
  *** sqrtnint: Warning: increasing stack size to 160000000.
  *** sqrtnint: Warning: increasing stack size to 320000000.
  *** sqrtnint: Warning: increasing stack size to 640000000.
time = 10,236 ms.
(20:24) gp > sqrtnint(a,123456);
  *** sqrtnint: Warning: increasing stack size to 80000000.
  *** sqrtnint: Warning: increasing stack size to 160000000.
  *** sqrtnint: Warning: increasing stack size to 320000000.
  *** sqrtnint: Warning: increasing stack size to 640000000.
  *** sqrtnint: Warning: increasing stack size to 1000001536.
  ***   at top-level: sqrtnint(a,123456)
  ***                 ^------------------
  *** sqrtnint: the PARI stack overflows !
  current stack size: 1000001536 (953.676 Mbytes)
  [hint] you can increase 'parisizemax' using default()

  ***   Break loop: type 'break' to go back to GP prompt
break>

I first ran sqrt to have an idea of the expected runtime of the more complex
sqrtnint algorithm.
It seems that starting for some value of the 2nd argument k :
  * sqrtnint leaks memory somewhat proportionally to k.
  * the runtime also starts to increase (maybe some correlation with the
memory leak).

I had a look at sqrtnint algorithm and from my understanding :
  * it first does some kind or argument reduction (I was not able to
understand why),
  * find an approximation to he kth root with largest power of 2 larger than
the expected kth root,
  * find the result applying standard Newton Raphson iterations.

The performance issue is linked to the fact that for large k, the
approximation is too far from the result
And Newton Raphson algorithm needs too many iterations, i.e. most iterations
barely increase precision
result until last ones which converge very fast to the result. For
sqrtnint(a,12345); more than 7700
iterations are needed while only 50 are needed for sqrtnint(a, 123); (while
of course the result
is far smaller).

In a recent "Project Euler" problem solution thread, "philiplu" user
discussed about a similar issue
and drastically improve the running time of his solution by first findind by
dichotomy the highest 12
bits before falling back to Newton Raphson.

Translated in pari it looks like :
iroot(x:int, n:int) =
{
    local(xbl:int, mask:int, y:int, ynew:int, test:int, nm1:int);
    xbl = logint(x, 2);
    if (n >= xbl, return(1));
    mask = 1 << ((xbl - 1) \ n);
    y = mask;
    if (y^n == x, return(y));
    for (i = 0, 10,
        mask >>= 1;
        if (mask == 0, return(y));
        ynew = y + mask;
        test = ynew^n;
        if (test <= x,
            if (test == x, return(ynew));
            y = ynew;
        );
    );
    y += mask;
    nm1 = n - 1;
    while (1 == 1,
        ynew = (nm1 * y + x \ (y^nm1)) \ n;
        if (ynew >= y, break);
        y = ynew;
    );
    return(y);
}

(20:44) gp > iroot(a,12345)
time = 386 ms.
%11 =
1010311380446684911802268122009368691535646607849018888903290947605488591556
607742
(20:44) gp > iroot(a,123456)
time = 535 ms.
%12 = 125907569

Of course choosing 12 bits is arbitrary and by playing with other values
there is an optimal
value for each n, k. Hoping to find more information I looked at gmp source
code but the rootrem
(in gmp/mpz/generic/rootrem.c) source code is extremely complex and there is
no comment with a
"magic" formula. Still gmp documentation seems to agree with this discussion
:
  The initial approximation a1 is generated bitwise by successively powering
a trial root with or
  without new 1 bits, aiming to be just above the true root. The iteration
converges quadratically
  when started from a good approximation. When n is large more initial bits
are needed to get
  good convergence. The current implementation is not particularly well
optimized.

iroot above could be used as a starting point for a faster implementation
(or maybe gmp rootrem
could be directly called when using gmp kernel (despite the comment, it is
very fast)).

Best regards

Jerome