LU benchmarks (Figure
and Table
)
show a performance that climbs slowly to 1200 MFLOPS for the MPML and
1000 MFLOPS for the SymPLA. As with the other versions, the overhead
is probably caused by the time needed to duplicate matrices, but in
this case, there is the added overhead needed when solving a triangular
matrix. Since the MPML library does not have a separate triangular
solver, the SymPLA has to duplicate the factorized matrix, remove the
unused triangular component of the matrix, and invert the diagonal
elements. This means that for each solution, the components are
duplicated twice, and the solver uses twice as many operations to
solve the system than the MPML benchmark needs.
Since SymPLA uses a number of temporary matrices in some operations, the benchmarks could not be run as far as the MPML; there was not enough memory.