Charles Greathouse on Thu, 20 Nov 2014 19:42:19 +0100

 Growing ordered set

Fairly often I have a problem of the type 'generate a number by a given process, and repeat until a duplicate is found'. I also have the related problem 'generate a number by a given process, and repeat until some limit, recording the number of distinct elements at each step'.

I can think of several ways, none satisfactory:
* concat a vector, search linearly. O(n) to add, O(n) to search, O(n) overall.
* listput a list, search linearly. O(1) to add, O(n) to search, O(n) overall.
* listput then listsort, setsearch. O(n log n) to add, O(log n) to search, O(n log n) overall.
* setsearch(,,1) and listinsert. O(n) to add, O(log n) to search, O(n) overall.
* (inspired by Martin-Shankar arXiv:1305.1984) create a scratch vector of length k, inserting in order, then concat and Set to a growing sorted list. O(n/k * log n) amortized time to insert, O(k + log n) time to search. With k = O(sqrt(n log n)) the overall time is O(sqrt(n log n)).

An alternate approach would be to allocate a bitfield (in gp you'd have to hack this together with vectorsmall) and do all operations in O(1), but this takes too much space to be practical in most cases.

I don't imagine there's a HashSet lurking somewhere in ?9...

My particular inspiration in this case is computing A247140 in the OEIS, but this sort of problem is not uncommon. Actually, come to think of it, PARI has rho (pollardbrent), which solves a similar problem; I wonder how it handles it?

Charles Greathouse
Analyst/Programmer
Case Western Reserve University