next up previous contents
Next: Parallel C++ Up: Parallel programming languages Previous: High Performance Fortran

MasPar Programming Language

The MasPar Programming Language (MPL) [, ] is an architecture specific, nonportable language for a massively parallel SIMD computer with between 1024 to 16384 processors, each with maximum 64 KB of local memory, arranged in a 2D torus mesh in which all the processors either execute the same instruction at each step, or are idle. The MPL is a modified version of ANSI C, with enhancements providing a double layered memory, and two different communication schemes between the processors.

   figure901
Figure: Organization of variables on the ACU and the PEs

As shown in Figure gif, the MPL supports two levels of memory and operations on these: the global memory (CMEM) which can store values used by sequential code and all processors, and the memory local to each processor element (PMEM), used in parallel code. The data stored in the CMEM may be used by parallel code, but the data local to a processor element cannot be used directly by pure sequential code.

Operations using PMEM data are made in parallel on all processor elements which are not idle. The operands of an expression may be located on other processor elements, in which case the data will be retrieved by the specified communication method before the operation is continued.

   figure909
Figure: The X-net communication network

The two communication methods between processors are the X-net primitives and the router. With the X-net, as seen in Figure gif, all processors may access the memory of another processor, located a specified number of steps away in one of 8 directions in the grid. The distance and the direction are the same for all active processors in the operation.

The router on the other hand, allows random destination, where each processor decides which processor element it wants to access. The communication is then handled by a 3-tier router where 16 processors share a single line in, and a single line out. The router hardware/software then orders the requests so that every processor gets to send or receive the data it wants to send/receive.

Of these two communication methods, X-net usually has the best performance, but in cases with random destination/source, or when the destination processor's coordinate is a function of the original processor's coordinate in the mesh, the router may be best, e.g. the transpose of a matrix, where distance varies greatly, is fastest when the router is used for communication.

The MPL is not a language where you can let the compiler handle the nitty-gritty details of parallel computation, but where the programmer has to give much thought to each detail of the program to get maximum performance. In particular, the programmer must avoid sequential computation, because even small fractions of non-parallel code can cause a dramatic decrease in efficiency, especially if that segment of code can be parallelized.


next up previous contents
Next: Parallel C++ Up: Parallel programming languages Previous: High Performance Fortran