Gonzalo Tornaria on Tue, 20 May 2003 11:21:26 -0500 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Sparse matrix multiplication // charpoly |
Idea: If we have a "sparse" matrix, and we are going to do a lot of multiplications on the left (right), we "study" it before, and replace it by something else that can be used to multiply matrices on the left (right). E.g.: ------------------------------------------- ? sl = matrix_sparseleft(m); time = 3,810 ms. ? sparseleft_mul(sl, m); time = 2,990 ms. ? m * m == % time = 11,740 ms. %22 = 1 ------------------------------------------- This is typical for 1000x1000 sparse matrices: it takes less time to compute this way, even including the "study" time. The "study" to prepare multiplication by the left consist in replacing each row by a vector of pairs [col, entry], and multiplication by another matrix amounts to take linear combinations of the rows according to this vectors of pairs. In my naive GP implementation (far from optimal), it seems to works faster with multiplication by the right than the left... (which makes sense, since matrices are organized by columns). That's also true for the PARI multiplication. To compute characteristic polynomials, we "study" the matrix once, and then we use the studied one to multiply on the right. This gives good results for big matrices. I'll add the programs in next message. ---------------------------------------------- In what follows, a "SPARSE" n by n matrix has about 4*n non-zero entries (random integers from 1 to 10000), whereas a "DENSE" matrix has all entries non-zero. A "RIGHT" is a sparse matrix studied for multiplication on the right, and a "LEFT" is a sparse matrix studied for multiplication on the left. CP(DENSE) is computing the charpoly for a dense matrix, CP(SPARSE) is computing the charpoly for a sparse matrix; for the GP program, it studies the matrix for the right once, and then it takes advantage of the fast multiplication. CP(SPARSE1) is charpoly for a sparse matrix whose entries are non-negative integers that add up to 4 by columns (like the examples of the previous message). The "generic" GP program runs almost as fast as the "special purpose" GP program (for 4-regular graphs). There isn't appreciable difference between multiplication by SPARSE and SPARSE1. Why is there difference for computing the charpoly? I think is because the coefficients of the intermediate matrices we compute explode, and that's of course much much worse when the coefficients are up to 10000 than when they are really small. In fact, I couldn't compute the CP(SPARSE) for 500x500 (it runs out of memory), but I can compute CP(SPARSE1). The timings are with the patched PARI (P4 2.0GHz).. ---------------------------------- 100x100 --------- PARI: SPARSE * SPARSE : 17.3 msec PARI: DENSE * SPARSE : 20.3 msec PARI: SPARSE * DENSE : 94.0 msec PARI: DENSE * DENSE : 160.0 msec GP: Study for right : 38.7 msec GP: SPARSE * RIGHT : 18.4 msec GP: DENSE * RIGHT : 23.2 msec GP: Study for left : 38.7 msec GP: LEFT * SPARSE : 19.5 msec GP: LEFT * DENSE : 26.3 msec PARI: CP(DENSE) : 80.64 sec PARI: CP(SPARSE) : 3.69 sec GP: CP(SPARSE) : 3.13 sec PARI: CP(SPARSE1) : 1.99 sec GP: CP(SPARSE1) : 1.20 sec ---------------------------------- 200x200 --------- PARI: SPARSE * SPARSE : 119.8 msec PARI: DENSE * SPARSE : 130.2 msec PARI: SPARSE * DENSE : 854.0 msec PARI: DENSE * DENSE : 1530.0 msec GP: Study for right : 150.2 msec GP: SPARSE * RIGHT : 70.0 msec GP: DENSE * RIGHT : 89.8 msec GP: Study for left : 152.0 msec GP: LEFT * SPARSE : 85.4 msec GP: LEFT * DENSE : 120.4 msec PARI: CP(SPARSE) : 57.50 sec GP: CP(SPARSE) : 43.79 sec PARI: CP(SPARSE1) : 26.49 sec GP: CP(SPARSE1) : 9.61 sec ---------------------------------- 500x500 --------- PARI: SPARSE * SPARSE : 1773.0 msec PARI: DENSE * SPARSE : 1679.0 msec PARI: SPARSE * DENSE :16110.0 msec PARI: DENSE * DENSE :26880.0 msec GP: Study for right : 919.0 msec GP: SPARSE * RIGHT : 427.0 msec GP: DENSE * RIGHT : 547.0 msec GP: Study for left : 943.0 msec GP: LEFT * SPARSE : 669.0 msec GP: LEFT * DENSE : 892.0 msec PARI: CP(SPARSE1) : 887.21 sec GP: CP(SPARSE1) : 175.12 sec ---------------------------------- 1000x1000 --------- PARI: SPARSE * SPARSE : 12.1 sec PARI: DENSE * SPARSE : 12.7 sec PARI: SPARSE * DENSE : 162.8 sec PARI: DENSE * DENSE : 251.1 sec GP: Study for right : 3.6 sec GP: SPARSE * RIGHT : 1.6 sec GP: DENSE * RIGHT : 2.2 sec GP: Study for left : 3.6 sec GP: LEFT * SPARSE : 3.0 sec GP: LEFT * DENSE : 3.9 sec PARI: CP(SPARSE1) :????.?? sec GP: CP(SPARSE1) :2021.34 sec ---------------------------------- NOTES: 0) Take all the number with a grain of salt, because this is not a "serious" statistic, just sample timings. 0') The GP program is certainly not optimal, and I have not even compiled it. A hand-made implementation in C should be even better. 0'') The PARI multiplication seems "to learn" (!), in fact, the first time one multiplies takes longer. E.g. for 500x500 my data shows that SPARSE*SPARSE takes more time than DENSE*SPARSE, but that seems to be because I use the same matrix, 10 times for each, but the first time it takes twice as much (!) So, put more salt on my numbers :-) 1) The current PARI implementation of multiplication takes good advantage of sparseness if the *second* matrix is sparse (independent of the first one), but not nearly as much if only the *first* one is sparse. I guess that's unavoidable, unless one tests the first matrix for sparseness and uses a the "other" algorithm (namely, m1*m2=(m2~*m1~)~) 2) For multiplication of 100x100 matrices, PARI is slightly better, even without counting the "study time"; but GP is slightly better for charpoly !! 3) For multiplication of 200x200 matrices, GP is better than PARI not counting the study time. For charpoly, GP is better, but for SPARSE1 matrices, is even better (3x). 4) For multiplication of 500x500 matrices, GP is sligthly better counting the study time (and 4x not counting it) !! CP is 5x better for SPARSE1 matrices. 5) For multiplication of 1000x1000 matrices, GP is 2x better than PARI counting the study time, and 8x without study time. Gonzalo -- GM/CS/S d? a-- C++(+++) UL+++(++++) P++>+++ L+++>++++ E--- W-(+) N+(++) w--- O M-(--) V-(--) PGP(--) b++(+++) G++ e>++++ ... A mathematician is a machine for converting coffee into theorems.