Welcome to Nanyan Jiang's HomePage

Nanyan's Homepage@winlab

 

Spring 2005 -- Course 572: Advanced Parallel and Distributed Computing  
N. Adiga et al, An Overview of the BlueGene/L Supercomputer, in the Proceedings of Supercomputing (SC2002) Technical Papers, 2002
Summary:
This paper gives an overview of the approach used by BlueGene/L to achieve teraFLOPS-scale computing. The approach used here is different from traditional approach of clustering large number of nodes in two ways:
  • First, BlueGene/L system is built out of a very large number of nodes,  each of which has a relatively modest clock rate, rather than clustering large, very fast SMPs, which is limited by power consumptions and footprints constraints;
  • Second, the design point of BG/L utilizes system-on-a-chip techniques that allow for integration of all system functions including compute processor, communications processor, 3 cache levels, and multiple high speed interconnection networks with sophisticated routing onto a single ASIC. This allows for latencies and bandwidths are significant better than those for nodes typically used in an ASCI scale supercomputers. As a result, memory is close to the processor and the power consumption is reduced (modest rate processor).
In addition, the integration of the inter-node communications network functions onto the same ASIC as the processors reduces cost, since the need for a separate, high-speed switch is eliminated.

Discussion notes:

What is the downside of putting memory and processor so close?
What is the advantage of choosing modest clock rate CPU for supercomputer design?
What is the main reason for cluster nodes categorized as computing node and I/O nodes as a design choice?

Predrag Tosic, A perspective on the future of massively parallel computing: fine-grain vs. coarse-grain parallel models comparison & contrast, in the Proceedings of the first conference on computing frontiers, pages 488 - 502, April, 2004.

Last updated on 01/25/2005