Bill Allombert on Thu, 20 Nov 2014 23:29:31 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Growing ordered set


On Thu, Nov 20, 2014 at 01:41:36PM -0500, Charles Greathouse wrote:
> Fairly often I have a problem of the type 'generate a number by a given
> process, and repeat until a duplicate is found'. I also have the related
> problem 'generate a number by a given process, and repeat until some limit,
> recording the number of distinct elements at each step'.
> 
> What's a good way to go about this in GP?
> 
> I can think of several ways, none satisfactory:
> * concat a vector, search linearly. O(n) to add, O(n) to search, O(n)
> overall.
> * listput a list, search linearly. O(1) to add, O(n) to search, O(n)
> overall.
> * listput then listsort, setsearch. O(n log n) to add, O(log n) to search,
> O(n log n) overall.
> * setsearch(,,1) and listinsert. O(n) to add, O(log n) to search, O(n)
> overall.

I am not quite sure what complexity mode you choose for your O(), but
the last one is not that bad, compared to the cost induced by the GP memory
manager...

Anyway, yes you can do better, using binary search tree!
See the attached (rather crude) script.

binary search tree:

? setrand(1);test(10^11)[1]
%1 = 537777
? ##
  ***   last result computed in 9,325 ms.

listinsert:

? setrand(1);test2(10^11)[1]
%2 = 537777
? ##
  ***   last result computed in 43,584 ms.

My script does not implement self balancing, thus it is only good on random
input (à la quicksort). But it is a proof of concept that t_LIST are sufficient
to implement fast data structures.
So as a corrolary it should be possible to build an efficient C implementation
of binary search tree using t_LIST as the underlying data structure for
compatibility with GP memory manager.

(my script also use local() variables to perfom "call by name").

Cheers,
Bill.
init(z)=L=List([[z,0,0]]);

add(x,i=1)=
{ 
  my([v,l,r]=L[i]);
  if(x==v,i
    ,x < v,
     if(l,
       add(x,l)
      ,
       listput(L,[x,0,0]);
       L[i][2]=#L;0)
    ,x > v,
     if(r,
       add(x,r),
       listput(L,[x,0,0]);
       L[i][3]=#L;0));
}

search(x,i=1)=
{ 
  my([v,l,r]=L[i]);
  if(x==v,i
    ,x < v,
     if(l, search(x,l))
    ,x > v,
     if(r, search(x,r)))
}

bal(i,j)=
{
  if (i<=j,my(u=(i+j)\2);add(M[u]);bal(i,u-1);bal(u+1,j));
}

balance()=
{
  local(M=vecsort(vector(#L,i,L[i][1])));
  init(M[(1+#M)\2]);
  bal(1,#M)
}

test(n,b=0)=
{
  local(L);
  my(c,u);
  init(random(n)); c++; 
  until(add(u),u=random(n);c++;
    if(b&&c==2^logint(c,2),gettime();balance();print(#L,":",gettime()));
  );[c,L]
}

test2(n)=
{
  local(L=List());
  my(c,u,p);
  while(1,
   u=random(n);c++;
   p=setsearch(L,u,1);
   if(!p,break);
   listinsert(L,u,p));
  [c,L]
}