Bill Allombert on Mon, 16 Jun 2003 22:56:47 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: memory management


On Mon, Jun 16, 2003 at 01:18:57PM -0400, Igor Schein wrote:
> Hi,
> 
> I am trying to understand the general concept of memory management in
> PARI.  I am aware that pari uses both stack and heap memory, but I'd
> like to hear what are strengths and weaknesses of each type of
> allocation, and what are the considerations whenever one is chosen
> over the other.  My lame perspective is the following:
> 
> - Stack memory helps you adopt to the size of physical memory of
> machine.  A huge drawback is that you may have a job that has been
> running for a long period of time, and when you  run out of stack,
> that time has been completely wasted.  To safeguard against that, I
> usually start gp (32bit) with the stack size of about 90% of physical memory
> of the machine or 2GB, whichever is smaller.
>  
> - Heap memory has more flexibility, but can potentially bring a
> machine to a halt on large memory jobs by eating up all virtual
> memory.  

The above is just secondary effect on how things are implemented,
for example you can implement stack to grow on demand, and you
can limit the amount of heap memory to a maximum.

So we need more background on how things are implemented:
1) The PARI stack is not the system stack, so it is in fact a part
of the heap.

2) We use malloc(3) to allocate the PARI stack on the heap. In fact 
we do not have much choice, since PARI/GP is a library. Allocating
memory without malloc would forbid programs linked with libpari to use
malloc() which is not acceptable. This would be a real pain for GP
already.

3) malloc(3) does not permit to extend a memory block in place,
which is what is required to expend the PARI stack.

Now what is the purpose of the PARI stack:
1) The PARI stack is very efficient way to allocate a lot of small 
memory block. This is very important if you have recursive object.
Here the basic code to allocate memory on the PARI stack, without
casts:

GEN new_chunk(long x)
{
  const GEN z =  avma - x;
  if (x > avma-bot) err(errpile);
  avma = z; return z;
}
This is essentially 2 substractions and 1 comparison.

A lot of system use a stack. 

2) By contrast, malloc() is usually very inefficient for
allocating/freeing lots of small block, but is much better for a few
large blocks. So, to avoid overflowing the stack, when we know we have a
few large block to allocate/free we use the heap.  

3) The main drawback of the stack is that garbage collecting change addresses
of object in the stack. To work-around this problem, we use clone which are
temporary copy of GEN in the heap that we free later. This may make
stack management easier in hard case.

4) GP use the heap to store contents of GP variable and history, so GP
programs may sometimes use less stack that their C counterpart (but more
heap).

There are several ways to avoid stack overflow, but due to the point you
raised, we never felt they were worth implementing. Also computer time
is very cheap compared to computer RAM.

1) The good: (this is more or less what you do)
We suppose the OS use copy on write for allocate memory. This allows us
to start with a really large stack without in fact using that much
memory. Only dirty pages really take up system resource.
We let GP free()/malloc() the stack when it return, to refresh the pages.

The drawback curently is that PARI will make no effort to not fill the
stack, and thus will use more memory than required. This is easy to fix
and I can eventually do it if I receive enough push :). This solution is
transparent for system with poor VM engine.

2) The bad: mmap(2)ing /dev/zero with MAP_FIXED to allocate the stack 
at some specified address, so that we can reallocate it at the same 
place. This look clever, but not only it is not portable but also
MAP_FIXED is not waranted to work at all.

3) The ugly: When the stack overflow we allocate a new stack twice
as large as the current one and we copy the old one a the start of the
new one. We keep the old one around until GP return, and make function
that matter about stack connexity to promote operand in the old stack
to the new one. This is slow and memory inefficient:
for example with a 1Gb of stack, PARI will use 3Gb of stack after
doubling once and 7Gb after doubling twice (to a 4Gb stack).

Cheers,
Bill.