Igor Schein on Mon, 17 May 2004 16:58:37 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: round4 performance


On Wed, May 05, 2004 at 08:35:40PM +0200, Xavier-François Roblot wrote:
> On Wed, 2004-05-05 at 17:36, Bill Allombert wrote:
> > On Tue, May 04, 2004 at 04:17:06PM +0200, Xavier-François Roblot wrote:
> > > Hi developers,
> > > 
> > > On Wed, 2004-04-28 at 04:36, Igor Schein wrote:
> > > > Hi,
> > > > 
> > > > here's a polynomial on which round2 heavily outperforms round4:
> > > > 
> > > > x^64 + 528*x^60 + 422640*x^56 + 154189440*x^52 + 46085461920*x^48 + 86643136
> > > > 30464*x^44 + 1067417121457152*x^40 + 76273480007101440*x^36 + 29307987576360
> > > > 23040*x^32 + 31785192406564024320*x^28 + 729662629421496287232*x^24 - 866453
> > > > 9928316858220544*x^20 + 114361270118934673858560*x^16 - 51757447983683584779
> > > > 87840*x^12 + 47264303406943521096007680*x^8 - 774599638688652156020195328*x^
> > > > 4 + 7585622913396815983510880256
> > > 
> > > I have committed a big patch that attempts to make round4 perform better
> > > with this kind of polynomial. There are some improvements but still it
> > > is significantly slower than round2... Also, I added some garbage
> > > collecting so at least the stack necessary should now be reasonable. In
> > > any case, the changes I have made still require a lot of tunings (and
> > > also probably debugging!), so any feedback is welcome.
> > 
> > Apparently, now round4 is faster than round2 by 10% on this example.
> > (maybe not if we look only at the prime 3, though).
> > 
> > I wonder if fastvalpos() could not be changed to compute 
> > newton sums iteratively and stop as soon as the criterium apply
> > and return 0, instead of precomputing a fixed set of sums.
> > 
> > Cheers,
> > Bill.
> 
> Well, I have modified update_alpha (after Karim pointed out a strange
> behavior in this function) and that kind of miraculously speed up
> dramatically that example!... As you will see, the computing time is now
> very reasonable and it runs with a small stack too (I hope the result is
> still correct though, I haven't checked yet). Igor, Karim and I still
> have some ideas for improvements for nilord but you need some new bad
> polynomials to test them. Please send me your worst examples!
> 
> Take care,
> 
> Xavier

As of current CVS, I have one:

x^64 + 144*x^62 + 9552*x^60 + 390432*x^58 + 11080200*x^56 + 232989696*x^54 +
 3780238752*x^52 + 48636265248*x^50 + 505878824736*x^48 + 4313989216800*x^46
 + 30476092609440*x^44 + 179725400591616*x^42 + 889696224175824*x^40 + 37113
75959364288*x^38 + 13078302651873216*x^36 + 38977344315307584*x^34 + 9825210
8786134728*x^32 + 209260046783039040*x^30 + 375757773758107200*x^28 + 566964
010597622400*x^26 + 715492120542918048*x^24 + 750523839570713088*x^22 + 6491
30912300207232*x^20 + 458125942466369664*x^18 + 260295367984115328*x^16 + 11
6982277577092224*x^14 + 40621591866960000*x^12 + 10554853128818688*x^10 + 19
60600165904448*x^8 + 242910928408320*x^6 + 17820025360128*x^4 + 592019290368
*x^2 + 1536953616

It did behave decently on 2.2.7, but slowed down considerably after
all latest changes. 

Thanks

Igor