Benchmarking Methods

Not only are we going to provide the benchmark numbers, we also provide the benchmark methods and techniques. How is that for service. Now you can run your own benchmarks.

Counting Your Effective HPC Cores

Can more cores per node address HPC needs or are thin nodes on the horizon?

As multi-core continues the dominate the x86 marketplace, there has always been a nagging question for some HPC users; What is better, many single socket multi-core nodes or fewer fat nodes with as many sockets as possible. Of course the amortization of hardware costs (power supplies, cases, hard disk, etc) is a win for the fat node approach, but what about performance? As nodes offer more and more cores, does HPC performance follow?

Read more: Counting Your Effective HPC Cores

Benchmarking A Multi-Core Processor For HPC

It is 10pm, do you know where your cores are?

Ever since multi-core processors hit the market I have always been interested in testing how well they actually work. Of course, there are application benchmarks, which are always a good measure, but I like to look at some basic properties of a new processor (I look at networking and storage in other tests). The compute testing I do is rather straightforward, yet I have seen very few of these types of results. In particular, most results are for Windows or involve "web benchmarks." Since I work in Linux High Performance Computing (Linux Clusters) sector, I thought some examples of how I evaluate multi-core may be of interest to others.

Read more: Benchmarking A Multi-Core Processor For HPC

Micro-Benchmarks vs Macro-Benchmarks

Assume Nothing, Test Everything

In a previous article, we learned how to test an interface with Netpipe. In terms of clusters, Netpipe can be considered a micro-benchmark as it only tests a single (but important) component of the cluster. Can we conclude that good Netpipe performance means we have a good cluster? Well, it depends, maybe we can and then again more testing may be needed. Let's consider the fact that Netpipe performance tells us about the maximum limits of TCP/IP performance between two nodes. When we run parallel applications, there is usually more involved than just raw TCP/IP performance. There is usually an MPI layer between your application and the TCP/IP layer. In addition, there are effects due to compilers and node hardware (i.e. dual vs. single) and even the application itself may stress the interconnect in way not measured by Netpipe. Tests that run over multiple comments are usually referred to as macro-benchmarks because they involve a "whole system test" vs a single component test. Both are valuable, but neither may tell the whole story, however.

Read more: Micro-Benchmarks vs Macro-Benchmarks

A Tool for Cluster Performance Tuning and Optimization

Do you know about the Beowulf Performance Suite?

The Beowulf Performance Suite (BPS) was designed to provide a comprehensive and comparative way of measuring cluster performance. Although BPS contains many benchmarking programs, BPS is not designed to directly benchmark clusters. BPS is designed as an analysis tool to measure differences due to hardware or software changes on the same cluster. In addition, successfully running all the tests provides some assurances that the cluster is configured properly.

The suite can run any or all programs and produce HTML output files. The use of HTML makes it trivial to share your results with others on the web. The following tests are available:

Read more: A Tool for Cluster Performance Tuning and Optimization

Search

Feedburner

Login Form

Share The Bananas


Creative Commons License
©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.