Ilya Zakharevich on Mon, 24 Jun 2024 01:00:52 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: ABI of “return objects on the bottom of the stack, continuously” (Re: Two stacks at once) |
On Sun, Oct 29, 2023 at 04:08:32PM -0700, Ilya Zakharevich wrote: > The quoted message was written ∼19 years ago. — And then I found a > solution “quite soon” — but could not find toits to write this down. > > The old discussion is on > https://pari.math.u-bordeaux.fr/archives/pari-dev-0412/msg00016.html The quoted message is on https://pari.math.u-bordeaux.fr/archives/pari-dev-2310/msg00018.html It discusses how to avoid gerepile()ing (cheaply!). (A copy at end.) ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ New development I found https://jackmott.github.io/programming/2016/08/20/when-bigo-foolsya.html Very short summary: With contemporary memory architectures, the price of ONE memory access to “a region of random access memory of size S” is not O(1), but O(√S). (With pretty good precision!) What is changed by this: My initial thought: The price of gerepile()ing is O(S) where S is the size of the object. To create this object one needs ≥O(S) time, so optimizing away gerepile() may only improve a constant, and only when “inside” a linear-time algorithm. (Of course, the improvement of this constant may be rather big if one needs a long change of gerepile()s.) My current thought: The price of gerepile() may be O(S√S) if the object is very “mixed up” in its memory representation. On the other hand, if CREATION of this object was “much more memory-local”, then the time to create this object might still have been O(S). CONCLUSION: without further investigation, one cannot exclude a possibility of cases where removing gerepile() would make a super-linear-time algorithm into a linear-time. Hope this helps, Ilya > > On Thu, Dec 16, 2004 at 11:49:19PM +0100, Bill Allombert wrote: > > > That said, the idea of several stacks has a lot of merit, but I don't > > > think we can retrofit it at this stage without major though. > > ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ grepile() is considered harmful > > I remind that the idea was to (almost?) completely eliminate > grepile() — which is the most confusing — and thoroughly > unneeded and wasteful — part of the implementation of PARI. > > The **need** for grepile() comes from the PARI’s ABI, whose idea is > that > > [[[ objects returned from a subroutine call are allocated on stack ]]] > > Moreover, these returned objects should better (for the purpose of > efficiency of the stack usage) be allocated continuously. Likewise, > they should better be at the bottom of the stack. > > However, the components of recursively built objects are often > constructed using OTHER ancillary subroutine calls. After such a > call to construct (say) the 3rd component, the stack looks like > > BEFORE … COMPONENT1 COMPONENT2 RESULT-OF-CALL > > Now we may need to post-process the RESULT-OF-CALL, and when > COMPONENT3 is finally returned by yet another ancillary call, the > stack becomes > > BEFORE … COMPONENT1 COMPONENT2 RESULT-OF-CALL COMPONENT3 > > — and this means that our 3 components are now not adjacent. > > ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ The proposed solution of 2004 > > Use two stacks alternately: when our caller expects our result to be > on STACKα, we call our ancillary routines when being switched to > STACKβ, and only the FINAL ancillary call during the calculation of > COMPONENT3 is made switching back to STACKα. > > Of course, there is a disadvantage: this may IN SOME CASES result in a > waste of memory: it may happen that we need a lot of memory on > STACKβ — while STACKα has a lot of unused memory. > > ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ WORKAROUND — of about 2005 ;—( > > Have both stacks use the same chunk of memory! To avoid collisions, > one of them should grow-down-from-top, the other one > grow-up-from-bottom. > > ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ IMPLEMENTATION > > • Have a new variable stack_dir. > • Make the stack allocation routines pay attention to stack_dir. > • (As a stopgap measure) make gerepile() (and friends) pay attention > to stack_dir. > • Make the starting value for stack_dir to be settable at the start > of runtime. > > After this, check that the current code works with the “non-standard” > value of stack_dir. If this ACTUALLY turns out to be feasible: > > • Add a new function flip_stack_dir(). > • Use this function in new code (as described above). > • (Leisurely,) insert flip_stack_dir() in suitable places in old code > and remove gerepile() code. > > The bookkeeping data for the stack is currently > stack_bot avma stack_top > (if I did not mix up the names bot/top!). avma grows when one > allocates objects on stack. The free area is between avma and > stack_top. > > The proposed data for the stack is > stack_min avma_min avma_max stack_max > The free area is between avma_min and avma_max. Depending on > stack_dir, avma is aliased to one of avma_min and avma_max, and > stack_top is aliased to the other one of these two. > > (The code taking into account stack_bot needs to be rewritten.) > > (One way to do this is that flip_stack_dir() flips stack_dir, and > exchanges avma and stack_top.) > > Hope this helps, > Ilya