next up previous contents
Next: Overhead Up: MasPar version Previous: Multiplication

LU factorization

LU benchmarks (Figure gif and Table gif) show a performance that climbs slowly to 1200 MFLOPS for the MPML and 1000 MFLOPS for the SymPLA. As with the other versions, the overhead is probably caused by the time needed to duplicate matrices, but in this case, there is the added overhead needed when solving a triangular matrix. Since the MPML library does not have a separate triangular solver, the SymPLA has to duplicate the factorized matrix, remove the unused triangular component of the matrix, and invert the diagonal elements. This means that for each solution, the components are duplicated twice, and the solver uses twice as many operations to solve the system than the MPML benchmark needs.

Since SymPLA uses a number of temporary matrices in some operations, the benchmarks could not be run as far as the MPML; there was not enough memory.