journal article
LitStream Collection
Schmidt, Brian K.; Sunderam, Vaidy S.
doi: 10.1002/cpe.4330060102pmid: N/A
In concurrent computing environments based on heterogeneous processing elements interconnected by general‐purpose networks, several classes of overheads contribute to lowered performance. The most obvious limitations are network throughput and latency, but certain other factors also play a significant role. In an attempt to gain some insight into the nature of these overheads, and to propose strategies to alleviate them, empirical measurements of native communication performance as well as application execution performance were conducted, using the PVM network computing system. These experiments and our analyses have identified load imbalance, the parallelism model adopted, communication delay and throughput, and within‐host overheads as the primary factors affecting performance in cluster environments. Interestingly, we find that agenda parallelism and load balancing strategies contribute significantly more to better performance than improved communications or system tuning. Drawing general conclusions on how these inefficiencies may be overcome is inadvisable because of the tremendous variability of many parameters in general purpose network environments; we therefore propose several potential approaches, including model selection criteria, partitioning strategies, and software system heuristics, to reduce overheads and enhance performance in network based environments.
Chuang, Ling‐Yu; Rego, Vernon; Mathur, Aditya
doi: 10.1002/cpe.4330060103pmid: N/A
Program unification is a technique for source‐to‐source transformation of code for enhanced execution performance on vector and SIMD architectures. This work focuses on simple examples of program unification to explain the methodology and demonstrate its promise as a practical technique for improved performance. Using simple examples to explain how unification is done, we outline two experiments in the simulation domain that benefit from unification, namely Monte Carlo and discrete‐event simulation. Empirical tests of unified code on a Cray Y‐MP multiprocessor show that unification improves execution performance by a factor of roughly 8 for given application. The technique is general in that it can be applied to computation‐intensive programs in various data‐parallel application domains.
doi: 10.1002/cpe.4330060104pmid: N/A
An algorithm is presented that solves a linear advection‐diffusion problem using a least‐squares formulation and a conjugate gradient method to solve the corresponding minimization problem. An implementation in CM‐Fortran on a Thinking Machines CM‐2 is compared with a serial implementation on an IBM RS6000. The maximum speed‐up obtained is a factor of 70. For fine grids, the CPU time scales almost ideally when the number of processors is increased from 4096 to 8192.
D'apuzzo, Marco; De Rosa, Maria Assunta
doi: 10.1002/cpe.4330060105pmid: N/A
Recently developed block‐iterative versions of some row‐action algorithms for solving general systems of sparse linear equations allow parallelism in the computations when the underlying problem is appropriately decomposed. However, problems associated with the parallel implementation of these algorithms have to be addressed.
Showing 1 to 5 of 5 Articles