Overhead in construction, copy and destruction show that there exist
both similarities and differences between the DEC Alpha (Figure
and Tables
,
and
) and the
Intel Paragon (Figure
and Tables
,
and
). Both overheads for the construction and
copy of matrices increase linearly with the memory required, as
expected, but while the time to copy is double that of construction on
DEC Alpha, there is much smaller difference in execution time for
these operations on the Intel Paragon. This difference may again be
caused by the Intel Paragon's pipelining of memory reads, which the DEC
Alpha cannot do.
Another difference is seen in the time needed to destroy the matrix. Intel Paragon uses constant time for this, while the time on DEC Alpha is directly proportional to the amount of memory. This may be caused by the DEC Alpha's system of allocation being page based, and that each page must be individually released during deallocation.
Since SymPLA uses temporary matrices in a number of operations, it begins to lose performance earlier than LAPACK, because the necessary preparations, and duplication of data takes time. In addition the extra use of memory causes swapping to start at smaller matrix dimensions than for the LAPACK benchmarkers, especially for matrix multiplication and LU factorization.
Timing the creation of submatrices turned out to be a problem, and is not included in the reports, as the timings indicated construction in less than a millisecond or less on the DEC Alpha, whose timer has millisecond resolution, and similarly for the Intel Paragon. The construction overhead for submatrices may therefore be considered negligible, as is the destruction overhead, compared to the construction overheads.
Operations on submatrices that require duplication of data into
temporary memory have been observed to operate at very reduced
performance levels, at about less than a tenth of the addition
performance shown here, and a somewhat reduced performance (
)
in other operations. As this implementation has support for using
whole matrices and continuous submatrices directly in the operations,
operations on such matrices should not suffer such reductions in
performance. It is possible that better ways to handle operations
involving random submatrices can be found, and such methods should
involve as much preselected settings as possible.
Summary of SymPLA addition on the 233 Mhz DEC Alpha
Summary of BLAS/ScaLAPACK addition on the 233 Mhz DEC Alpha
Summary of SymPLA addition on the Intel Paragon
Summary of BLAS/ScaLAPACK addition on the Intel Paragon