Re: ABI of “return objects on the bottom of the stack, continuously” (Re

Ilya Zakharevich on Mon, 24 Jun 2024 01:00:52 +0200

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: ABI of “return objects on the bottom of the stack, continuously” (Re: Two stacks at once)

To: Pari Developers Mailing List <pari-dev@pari.math.u-bordeaux.fr>
Subject: Re: ABI of “return objects on the bottom of the stack, continuously” (Re: Two stacks at once)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Date: Sun, 23 Jun 2024 16:00:45 -0700
References: <20041216064018.GB25285@math.berkeley.edu> <20041216132127.GH23145@seventeen> <20041216215251.GA26645@math.berkeley.edu> <20041216224919.GJ23145@seventeen> <20041217044224.GA26828@math.berkeley.edu> <ZT7l8EgQBslM4G8V@login.math.berkeley.edu>

On Sun, Oct 29, 2023 at 04:08:32PM -0700, Ilya Zakharevich wrote:
> The quoted message was written ∼19 years ago. — And then I found a
> solution “quite soon” — but could not find toits to write this down.
> 
> The old discussion is on
>   https://pari.math.u-bordeaux.fr/archives/pari-dev-0412/msg00016.html

The quoted message is on
    https://pari.math.u-bordeaux.fr/archives/pari-dev-2310/msg00018.html
It discusses how to avoid gerepile()ing (cheaply!).  (A copy at end.)

⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ New development

I found
   https://jackmott.github.io/programming/2016/08/20/when-bigo-foolsya.html

Very short summary:

  With contemporary memory architectures, the price of ONE memory access
  to “a region of random access memory of size S” is not O(1), but
  O(√S).  (With pretty good precision!)

What is changed by this:

  My initial thought:
  
    The price of gerepile()ing is O(S) where S is the size of the
    object.  To create this object one needs ≥O(S) time, so optimizing
    away gerepile() may only improve a constant, and only when “inside”
    a linear-time algorithm.

      (Of course, the improvement of this constant may be rather big
      if one needs a long change of gerepile()s.)

  My current thought:

   The price of gerepile() may be O(S√S) if the object is very “mixed
   up” in its memory representation.  On the other hand, if CREATION
   of this object was “much more memory-local”, then the time to
   create this object might still have been O(S).

   CONCLUSION: without further investigation, one cannot exclude a
   possibility of cases where removing gerepile() would make a
   super-linear-time algorithm into a linear-time.

Hope this helps,
Ilya
 
> > On Thu, Dec 16, 2004 at 11:49:19PM +0100, Bill Allombert wrote:
> > > That said, the idea of several stacks has a lot of merit, but I don't
> > > think we can retrofit it at this stage without major though.
> 
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ grepile() is considered harmful
> 
> I remind that the idea was to (almost?) completely eliminate
> grepile() — which is the most confusing — and thoroughly
> unneeded and wasteful — part of the implementation of PARI.
> 
> The **need** for grepile() comes from the PARI’s ABI, whose idea is
> that
> 
>  [[[  objects returned from a subroutine call are allocated on stack  ]]]
> 
> Moreover, these returned objects should better (for the purpose of
> efficiency of the stack usage) be allocated continuously.  Likewise,
> they should better be at the bottom of the stack.
> 
> However, the components of recursively built objects are often
> constructed using OTHER ancillary subroutine calls.  After such a
> call to construct (say) the 3rd component, the stack looks like
> 
>    BEFORE … COMPONENT1 COMPONENT2  RESULT-OF-CALL
> 
> Now we may need to post-process the RESULT-OF-CALL, and when
> COMPONENT3 is finally returned by yet another ancillary call, the
> stack becomes
> 
>    BEFORE … COMPONENT1 COMPONENT2  RESULT-OF-CALL COMPONENT3
> 
>  — and this means that our 3 components are now not adjacent.
> 
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ The proposed solution of 2004
> 
> Use two stacks alternately: when our caller expects our result to be
> on STACKα, we call our ancillary routines when being switched to
> STACKβ, and only the FINAL ancillary call during the calculation of
> COMPONENT3 is made switching back to STACKα.
> 
> Of course, there is a disadvantage: this may IN SOME CASES result in a
> waste of memory: it may happen that we need a lot of memory on
> STACKβ — while STACKα has a lot of unused memory.
> 
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ WORKAROUND — of about 2005 ;—(
> 
> Have both stacks use the same chunk of memory!  To avoid collisions,
> one of them should grow-down-from-top, the other one
> grow-up-from-bottom.
> 
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ IMPLEMENTATION
> 
>  • Have a new variable stack_dir.
>  • Make the stack allocation routines pay attention to stack_dir.
>  • (As a stopgap measure) make gerepile() (and friends) pay attention
>    to stack_dir.
>  • Make the starting value for stack_dir to be settable at the start
>    of runtime.
> 
> After this, check that the current code works with the “non-standard”
> value of stack_dir.  If this ACTUALLY turns out to be feasible:
> 
>  • Add a new function flip_stack_dir().
>  • Use this function in new code (as described above).
>  • (Leisurely,) insert flip_stack_dir() in suitable places in old code
>    and remove gerepile() code. 
> 
> The bookkeeping data for the stack is currently
>      stack_bot    avma   stack_top
> (if I did not mix up the names bot/top!).  avma grows when one
> allocates objects on stack.  The free area is between avma and
> stack_top.
> 
> The proposed data for the stack is
>     stack_min   avma_min  avma_max   stack_max
> The free area is between avma_min and avma_max.  Depending on
> stack_dir, avma is aliased to one of avma_min and avma_max, and
> stack_top is aliased to the other one of these two.
> 
>   (The code taking into account stack_bot needs to be rewritten.)
> 
>  (One way to do this is that flip_stack_dir() flips stack_dir, and
>   exchanges avma and stack_top.)
> 
> Hope this helps,
> Ilya

Prev by Date: Preprint: Efficient probabilistic factorization of F(x,y)=g(x,y) f(x,y) modulo composite integers n assuming the solution is unique.
Next by Date: Misprint in the documentation of Vec
Previous by thread: Preprint: Efficient probabilistic factorization of F(x,y)=g(x,y) f(x,y) modulo composite integers n assuming the solution is unique.
Next by thread: Misprint in the documentation of Vec
Index(es):
- Date
- Thread