Nested parallel 2D Delaunay triangulation method
FIELD OF THE INVENTION
The invention relates to parallel processing on distributed memory and shared memory parallel computers having multiple processors.
BACKGROUND OF THE INVENTION
It is hard to program parallel computers. Dealing with many processors at the same time, either explicitly or implicitly, makes parallel programs harder to design, analyze, build, and evaluate than their serial counterparts. However, using a fast serial computer to avoid the problems of parallelism is often not enough. There are always problems that are too big, too complex, or whose results are needed too soon.
Ideally, a parallel programming model or language should provide the same advantages we seek in serial languages: portability, efficiency, and ease of expression. However, it is typically impractical to extract parallelism from sequential languages. In addition, previous parallel languages have generally ignored the issue of nested parallelism, where the programmer exposes multiple simultaneous sources of parallelism in an algorithm. Supporting nested parallelism is particularly important for irregular algorithms, which operate on non-uniform data structures (for example, sparse arrays, trees and graphs).
Parallel Languages and Computing Models
The wide range of parallel architectures make it difficult to create a parallel computing model that is portable and efficient across a variety of architectures. Despite shifts in market share and the demise of some manufacturers, users can still choose between tightly-coupled shared-memory multiprocessors such as the SGI Power Challenge, more loosely coupled distributed-memory multicomputers such as the IBM SP2, massively-parallel SIMD machines such as the MasPar
MP-2, vector supercomputers such as the Cray C90, and loosely coupled clusters of workstations such as the DEC SuperCluster. Network topologies are equally diverse, including 2D and 3D meshes on the Intel Paragon and ASCI Red machine, 3D tori on the Cray T3D and T3E, butterfly networks on the IBM SP2, fat trees on the Meiko CS-2, and hypercube networks on the SGI Origin2000. With extra design axes to specify, parallel computers show a much wider range of design choices than do serial machines, with each choosing a different set of tradeoffs in terms of cost, peak processor performance, memory bandwidth, interconnection technology and topology, and programming software.
This tremendous range of parallel architectures has spawned a similar variety of theoretical computational models. Most of these are variants of the original CRCW PRAM model (Concurrent-Read Concurrent-Write Parallel Random Access Machine), and are based on the observation that although the CRCW PRAM is probably the most popular theoretical model amongst parallel algorithm designers, it is also the least likely to ever be efficiently implemented on a real parallel machine. That is, it is easily and efficiently portable to no parallel machines, since it places more demands on the memory system in terms of access costs and capabilities than can be economically supplied by current hardware. The variants handicap the ideal PRAM to resemble a more realistic parallel machine, resulting in the locality-preserving H-PRAM, and various asynchronous, exclusive access, and queued PRAMs. However, none of these models have been widely accepted or implemented.
Parallel models which proceed from machine characteristics and then abstract away details--that is, "bottom-up" designs rather than "top-down"--have been considerably more successful, but tend to be specialized to a particular architectural style. For example, LogP is a low-level model for message-passing machines, while BSP defines a somewhat higher-level model in terms of alternating phases of asynchronous computation and synchronizing communication between processors. Both of these models try to accurately characterize the performance of any message-passing network using just a few parameters, in order to allow a programmer to reason about and predict the behavior of their programs.
However, the two most successful recent ways of expressing parallel programs have been those which are arguably not models at all, being defined purely in terms of a particular language or library, with no higher-level abstractions. Both High Performance Fortran and the Message Passing Interface have been created by committees and specified as standards with substantial input from industry, which has helped their widespread adoption. HPF is a full language that extends sequential Fortran with predefined parallel operations and parallel array layout directives. It is typically used for computationally intensive algorithms that can be expressed in terms of dense arrays. By contrast, MPI is defined only as a library to be used in conjunction with an existing sequential language. It provides a standard message-passing model, and is a superset of previous commercial products and research projects such as PVM and NX. Note that MPI is programmed in a control-parallel style, expressing parallelism through multiple paths of control, whereas HPF uses a data-parallel style, calling parallel operations from a single thread of control.
Nested and Irregular Parallelism