Ilya Zakharevich on Fri, 30 Aug 2024 08:14:03 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[pre-announcement of a PATCH 2.16.1-alpha] 90x speedup of forprime() above 2^64


The people who paid attention to my earlier messages about PARI’s
forprime() may already know that (due to a combination of unfortunate
events?) PARI’s developers had (IMO unfathomable) ideas that sieving
is slower than running nextprime() for numbers as low as 10^12.  (See
various files Changes*; the first wrong decision I know about is the
change 109 in v2.6.)

Currently I’m integrating a patch to 2.16.1 (since 2.16.2 has another
[unfathomable???] BACKWARDS change) supporting sieving up to about
10^290 (of course, this is not feasible “on contemporary computers”
since this requires about 10^145 bytes memory for the prime table).
With this patch, I can test where is the REAL crossover (since,
INDEED, eventually nextprime() IS going to become quicker!).

  ANSWER: on my computer, I can allocate about 12GB for a prime table,
  	  and (with primediff) this supports prime table up to 320
	  billions, so one can sieve up to 10^23.

	  The current implementation is not fully optimized, but still
	  it performs about 18 times quicker than the stock forprime()
	  even at 10^23.  (With 128-bit numbers closer to 2^64 it
	  performs up to >90 times quicker!)

Moreover, the dependence of speed on the size of numbers “follows a
simple law”.  Extrapolating this law suggests that the crossover¹⁾
happens about 0.9·10^26.

  CONCLUSION: it makes sense to support prime tables up to 10^13.

    ¹⁽ This is the crossover with the CURRENT IMPLEMENTATION of
       forprime()-via-nextprime().  As I explained earlier
         http://megrez.math.u-bordeaux.fr/archives/pari-dev-2401/msg00049.html
       I expect that this may be sped up about 5 times (see the line
       in the middle of the attached image).

       If such an optimization is possible, the crossover happens at ∼2.5·10^24.

This may require up to 320 GB memory (with the footnote’s
optimization, 57 GB memory — so fully feasible!) to store diffptr for
the prime table.

Hope this helps,
Ilya

  P.S.  I attach the plot of the time (in seconds) of sieving an
  	interval with 2 billion numbers (depending on the size of the
  	number).  Since it seems that the list does not support
  	images, here is the duplicate:
	  http://ilyaz.org/software/tmp/sieve_vs_nextprime_on_2words-optimized.png
	  

Attachment: sieve_vs_nextprime_on_2words-optimized.png
Description: PNG image