hermann on Sat, 15 Jul 2023 00:03:06 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: Why is "lift(Mod(qnr, n)^(n\4))" 16% slower than C libgmp "powm(r, qnr, n/4, n)" ? |
On 2023-07-14 19:13, Bill Allombert wrote:
The short answer is that the GNU MP library does not provide a function mpn_powm that PARI could use. mpz_powm use a lot of internal mpn functions for fast modularreduction which are very efficient but not public.Now, I could add a wrapper for mpz_powm for large entries but 16% slower is notthat bad and we need fast modular reduction in more general setting.Could you compare p=(2^95369 + 1)/3; Mod(2,p)^((p-1)/5) with mpz_powm and PARI ?
Sure.Performance problem was related to the base, exponent and modulus I used. For your base, exponent and modulus gp outperforms C++ with mpz_powm() by 27% !
That is very good to know. OS is Ubuntu 22.04 Desktop. gp: 19.6s (5.440 GHz) g++: 26.9s (5.320 GHz) hermann@7600x:~$ cat 95369.gp p=(2^95369 + 1)/3; x=lift(Mod(2, p)^((p-1)\5)); ## print(x); hermann@7600x:~$ perf stat -e cycles,task-clock gp < 95369.gp Reading GPRC: /etc/gprc GPRC Done. GP/PARI CALCULATOR Version 2.15.3 (released) amd64 running linux (x86-64/GMP-6.2.1 kernel) 64-bit versioncompiled: Jun 30 2023, gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1)
threading engine: single (readline not compiled in, extended help enabled) Copyright (C) 2000-2022 The PARI GroupPARI/GP is free software, covered by the GNU General Public License, and comes
WITHOUT ANY WARRANTY WHATSOEVER. Type ? for help, \q to quit. Type ?18 for how to get moral (and possibly technical) support. parisizemax = 2000003072, primelimit = 500000 *** last result computed in 19,604 ms. 1 Goodbye! Performance counter stats for 'gp': 107,245,113,513 cycles # 5.440 GHz19,715.98 msec task-clock # 1.000 CPUs utilized
19.717168440 seconds time elapsed 19.605913000 seconds user 0.110405000 seconds sys hermann@7600x:~$hermann@7600x:~$ g++ 95369.cc -lgmp -lgmpxx -O3 -Wall -pedantic -Wextra -o 95369
hermann@7600x:~$ perf stat -e cycles,task-clock ./95369 28709-digit prime p (95368 bits) 26.8999s 1 done Performance counter stats for './95369': 143,114,482,703 cycles # 5.320 GHz26,901.18 msec task-clock # 1.000 CPUs utilized
26.902467744 seconds time elapsed 26.901533000 seconds user 0.000000000 seconds sys hermann@7600x:~$ cat 95369.cc // g++ 95369.cc -lgmp -lgmpxx -O3 -Wall -pedantic -Wextra -o 95369 // (cpplinted and cppchecked) // #include <time.h> #include <math.h> #include <gmpxx.h> #include <assert.h> #include <iostream> int main() { mpz_class a, b, c, p; char *buf; int bits; float dt; buf = new char[32000000]; assert(buf); mpz_ui_pow_ui(a.get_mpz_t(), 2, 95369); a += 1; a /= mpz_class("3"); p = a; c = (p-1) / 5; std::cerr<< strlen(mpz_get_str(buf, 10, p.get_mpz_t())) << "-digit prime p (" << (bits = strlen(mpz_get_str(buf, 2, p.get_mpz_t()))) << " bits)\n";
b = 2; clock_t start = clock();mpz_powm(a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t(), p.get_mpz_t());
dt = static_cast<float>(clock() - start) / CLOCKS_PER_SEC; std::cerr << dt << "s\n"; std::cout << a << "\n"; std::cerr << "done\n"; delete buf; return 0; } hermann@7600x:~$ Regards, Hermann.