There are two general architectures commonly in use in parallel computing today: the Single Instruction Multiple Data (SIMD) computers (e.g. the MasPar and the Connection Machine), and Multiple Instruction Multiple Data (MIMD) computers (e.g. the Intel Paragon and workstation clusters).
SIMD computers execute the same operation on a lot of data, and are quite effective on large sets of data which are to be used in the same manner. Small sets of data tend to reduce performance. SIMD computers often use several thousand simple processors with a small amount of local memory and no swap space. The MasPar used in this project has 16384 processors each with 64 KB of memory, with timesharing facilities that enable sharing of the memory between a number of simultaneous processes, and swapin/swapout facility to enable shorter jobs to interrupt larger jobs while they are running.
MIMD computers can follow different code paths ``simultaneously'', and can operate efficiently on both small and large datasets. They are very useful when many different operations must be performed on different parts of the data, especially when the choice of operation depends heavily on the data itself, something which SIMD machines do not handle efficiently. In group operations, the individual processors may have to be synchronized, and this can degrade performance dramatically if the time spent waiting for data is large compared to the time spent computing. One solution to this problem is to do some of the computation while the communication is handled in the background. MIMD computers often have fast and advanced ``off the shelf'' processors with some megabytes of memory with some swap space.
Both parallel versions presented here use a frontend/backend topology where the frontend handles all C++ application code, with the sequential ordering of operations and the sequential processing of data, while the backend stores the matrixdata in the manner best suited to the architecture, and executes the operations requested by the frontend.