The Basic Linear Algebra Communication Subprograms constitute a message-passing library designed for linear algebra.

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that may be implemented efficiently and uniformly across a large range of distributed memory platforms.

The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer of ScaLAPACK. Key ideas in the BLACS include:

Standard interface across platforms.

One of the main strengths of the BLACS is that code which uses the BLACS for its communication layer can run unchanged on any supported platform. There are various packages designed to provide a message passing interface that remains unchanged on several platforms, including PICL, and more recently, MPI. These packages are not available on all of the platforms that we wish to use. More importantly, they are attempts at general libraries, and are thus somewhat harder to use than a more restricted code.

The BLACS are written specifically for linear algebra programming. Since the audience of the BLACS is known, the interface and methods of using the routines can be simpler than for those of more general message passing layers.

The BLACS have been written on top of the following message passing layers:

CMMD: Allows the BLACS to run on Thinking Machine's CM-5.
MPI: Allows the BLACS to run across most parallel platforms.
MPL: Allows the BLACS to run on IBM's SP series (SP1 and SP2).
NX: Allows the BLACS to run on Intel's supercomputer series (iPSC2, iPSC/860, DELTA and PARAGON).
PVM: Allows the BLACS to run anywhere PVM is supported, which includes most UNIX systems.

The BLACS are freely available, and can be downloaded from here.

Process Grid and scoped operations.

The processes of a parallel machine with P processes are often presented to the user as a linear array of process IDs, labeled 0 through (P - 1). For reasons described below, it is often more convenient to map this 1-D array of P processes into a logical two dimensional process mesh, or grid. This grid will have R process rows and C process columns, where R * C = G <= P. A process can now be referenced by its coordinates within the grid (indicated by the notation {i, j}, where 0 <= i < R, and 0 <= j < C), rather than a single number. An example of such a mapping is shown in below.

An operation which involves more than just a sender and a receiver is called a scoped operation. All processes that participate in a scoped operation are said to be within the operation's scope.

On a system using a linear array of processes, the only natural scope is all processes. Using a 2-D grid, we have 3 natural scopes, as shown in the following table.

SCOPE                    MEANING

------   ----------------------------------------------

Row      All processes in a process row participate.

Column   All processes in a process column participate.

All      All processes in the process grid participate.

These groupings of processes are of particular interest to the linear algebra programmer, since distributed data decompositions of a 2D array (a linear algebra matrix) tend to follow this process mapping. For instance, all of a distributed matrix row can be found on a process row, etc.

Viewing the rows/columns of the process grid as essentially autonomous subsystems provides the programmer with additional levels of parallelism. Of course, how independent these rows and columns actually are will depend upon the underlying machine. For instance, if the grid's processors are connected via ethernet, we can see that the only gain will be in ease of programming. Speed is unlikely to increase, since if one processor is communicating, no others can. If this is the case, process rows or columns will not be able to perform different distributed tasks at the same time. Fortunately, most modern supercomputer interconnection networks are at least as rich as a 2D grid, so that these additional levels of parallelism can be exploited.

Context.

In the BLACS, each process grid is enclosed in a context. A context may be thought of as a message passing universe. This means that a grid can safely communicate even if other (possibly overlapping) grids are also communicating.

In most respects, we can use the terms "grid" and "context" interchangeably. I.e., we may say "perform operation in context X" or "in grid X". The slight difference here is that the user may define two exactly identical grids (say, two 1x3 process grids, both of which use processes 0, 1, and 2), but each will be wrapped in its own context, so that they are distinct in operation, even though they are indistinguishable from a process grid standpoint.

Contexts are used so that individual routines using the BLACS can, when required, safely operate without worrying if other distributed codes are being executed on the same machine.

Another example of the use of context might be to define a normal 2D process grid about which most computation takes place. However, at certain sections it may be more convenient to access the processes as a 1D grid, and at certain others we may wish, for instance, to share information among nearest neighbors. We will therefore want each process to have access to three contexts: the 2D grid, the 1D grid, and a small grid which contains the process and its nearest neighbors.

Therefore, we see that context allows us to:

    Create arbitrary groups of processes upon which to execute

    Create an indeterminate number of overlapping or disjoint grids

    Isolate each grid so that grids do not interfere with each other

In the BLACS, there are two grid creation routines (BLACS_GRIDINIT and BLACS_GRIDMAP ) which create a process grid and its enclosing context. These routines return context handles, which are simple integers. Subsequent BLACS routines will be passed these handles, which allow the BLACS to determine what context/grid a routine is being called from. The user should never actually manipulate these handles: they are opaque data objects which are only meaningful for the BLACS routines.

Contexts consume resources. It is therefore advisable to release contexts when they are no longer needed. This is done via the routine BLACS_GRIDEXIT. When the entire BLACS system is shut down (via a call to BLACS_EXIT), all outstanding contexts are automatically freed.

Array-based Communication.

Many communication packages can be classified as having operations based on one dimensional arrays, which are the machine representation for linear algebra's vector class. In programming linear algebra problems, however, it is more natural to express all operations in terms of matrices. Vectors and scalars are, of course, simply subclasses of matrices. On computers, a linear algebra matrix is represented by a two dimensional array (2D array), and therefore the BLACS operate on 2D arrays.

The BLACS recognize the two most common classes of matrices for dense linear algebra. The first of these classes consist of general rectangular matrices, which in machine storage are 2D arrays consisting of M rows and N columns, with a leading dimension, LDA, that determines the distance between successive columns in memory.

The general rectangular matrices therefore take the following parameters as input when determining what array to operate on:

M: (input) INTEGER

The number of matrix rows to be operated on.
N: (input) INTEGER

The number of matrix columns to be operated on.
A: (input/output) TYPE DEPENDS ON ROUTINE, array of dimension (LDA,N)

A pointer to the beginning of the (sub)array to be sent.
LDA: (input) INTEGER

The distance between two elements in matrix row.

The second class of matrices recognized by the BLACS are trapezoidal matrices (triangular matrices are a sub-class of trapezoidal) Trapezoidal arrays are defined by M, N, and LDA, as above, but they have two additional parameters as well. These parameters are:

UPLO: (input) CHARACTER*1

Indicates whether the matrix is upper or lower trapezoidal, as discussed below.
DIAG: (input) CHARACTER*1

Indicates whether the diagonal of the matrix is unit diagonal (will not be operated on) or otherwise (will be operated on).

The shape of the trapezoidal arrays is determined by these parameters as follows:

The packing of arrays (if required) so that they may be sent efficiently is hidden, allowing the user to concentrate on the logical matrix, rather than how the data is organized in the machine's memory.

If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog:

The Basic Linear Algebra Communication Subprograms constitute a message-passing library designed for linear algebra.

0 commenti:

Post a Comment

Random Posts

Recent Posts

Popular Posts

Labels

Archive

About Me

Find Us on Flipboard

Find Us On Facebook

My Favorites in Instagram

Find Us On Twitter