Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^6

Ilya Zakharevich on Thu, 12 Sep 2024 16:04:49 +0200

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64

To: Bill Allombert <Bill.Allombert@math.u-bordeaux.fr>
Subject: Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Date: Thu, 12 Sep 2024 07:04:45 -0700
Cc: pari-dev@pari.math.u-bordeaux.fr
Delivery-date: Thu, 12 Sep 2024 16:04:49 +0200
In-reply-to: <ZuLxSc1DGX_euGPH@debian.attlocal.net>
References: <ZsH0fuX68GJ4oDyO@debian.attlocal.net> <Zt2lpTkZ868E-X8g@seventeen> <ZuLxSc1DGX_euGPH@debian.attlocal.net>

On Thu, Sep 12, 2024 at 06:48:57AM -0700, Ilya Zakharevich wrote:
> > Could explain what rem_half does ?
> 
> Half-width divll() — less storing the result of division.  (I did not
> check — maybe the “whole ½-width analogue” of divll() would be
> compiled into the same assembler code — by optimizing away the whole
> bruhaha with hiremainder…  Just to be safe, I modified the version
> from x86 assembler to return only the remainder.)

Forgot to add: on several processors at hand it gives up to 1.8 times
speedup comparing to (ulong x) / (ulong y).

> > If I understand correctly, 
> > you are suggesting to change is optimal_chunk:
> > ulong chunk = 0x80000UL;
> > to
> > ulong chunk = 0x800000UL;
> > 
> > Is it correct ?
> 
> No.  The main bug was in “what are the units of the ‘chunk’ variable”.
> The code in one place thinks that it counts “numbers-to-sieve”, in the
> other that it counts “bytes-in-the-arena”.  (These differ 16 times, which

Got carried away and forgot to focus on the content — sorry!

OK, so there are

  • possible up-to-1.8 speedup due to 64:32 division;
  • the corrected default arena allocation.

(The effects of these are in the table in my original post: compare
two left columns with the “no␣options” column on the right.)

  • Customizable arena size.

To see effects of this, compare the “no␣options” column with two
columns on the right of it.

Hope this helps,
Ilya

References:
- Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64
  - From: Bill Allombert <Bill.Allombert@math.u-bordeaux.fr>
- Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64
  - From: Ilya Zakharevich <nospam-abuse@ilyaz.org>

Prev by Date: Re: The bug still remains in 2.16.2-beta
Next by Date: Re: Supporting 2u
Previous by thread: Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64
Next by thread: Re: Wrong error message from 2u
Index(es):
- Date
- Thread