Maximizing beowulf performance
At this point in time the beowulf (and other related ompute cluster) architectures has come of age in Linux. Few indeed are those in any realm of technical computing that are unaware of the fact that one an assemble a collection of commodity off the shelf (COTS) computers and networking hardware into a high performance supercomputing environment. However, a detailed knowledge or appreciation for the bottlenecks and special problems associated with beowulf design is not so common. A review of the important bottlenecks and design features of a beowulf is given along with associated benchmarking and measurement tools to illustrate how to bridge the gap between the simple "recipe" of a beowulf as a pile of compute nodes, interconnected with a fast network and running linux and the realities of engineering a parallel code and beowulf-style luster to achieve satisfactory performane.