Much has changed in the supercomputing arena. Even you can get in the game!
Recently, Sebastian Anthony wrote an article for ExtremeTech entitled What Can You Do With A Supercomputer? His conclusion was "not much" and for many people he is largely correct. However, there is deeper understanding that may change the answer to "plenty."
He was mostly right when talking about the worlds largest supercomputers. Indeed, one very workable past definition of a supercomputer was "any computer that had at least a six digit price tag." In the past, that was largely true and created a rather daunting barrier to entry for those who needed to crunch numbers. The cost was due to an architectural wall between supercomputers and the rest of computing. These systems were designed to perform math very quickly using vector processors. It all worked rather well until the cost of fabrication made creating your own vector CPU prohibitively expensive.
The Commodity Juggernaut
The only way to justify today's high CPU fabrication costs is to sell a boat-load of processors. The traditional supercomputer market, measured in the hundreds of new systems per year, could not afford to keep spinning custom CPUs for such a small market. At the same time commodity x86 processors were getting faster due to competitive forces, which in-turn created higher volumes that justified the high fabrication costs.
The bottom line; commodity X86 processors got fast and cheap. Niche vector processors had trouble competing in this space and virtually all the traditional supercomputer companies that were still in existence began selling parallel designs based on a cluster of commodity CPUs. The basic components were a compute node (consisting of a commodity CPU, memory, and possibly a hard disk), an interconnect, and some type of global storage. The interconnects could vary from the cheapest (slowest), which in the beginning was Fast Ethernet, to the more expensive (fastest) like early Myrinet or QSnet. Parallel programing was done using the Message Passing Interface (MPI) library for Fortran, C, and C++. This scalable approach opened up a whole new era of commodity supercomputing. The trend is clearly evident in the figure below which shows the progression of processor family for each of the fastest 500 computers over the last 18 years (As ranked by the HPL benchmark).
The ability to use commodity off the shelf parts lowered the barrier of entry significantly and allowed scientists and engineers to buy high performance computing power to fit their budget. Clusters of all sizes starting showing up everywhere. The use of standard Linux distributions also made it possible to "roll your own" cluster with little more than some old x86 boxes. In effect, some level of "supercomputing" was available to the masses as the distinction between high-end and low-end was essential was how many processors you could buy.
Somewhere in the mix, the term "supercomputer" began to fade and the term High Performance Computing or HPC fell into favor. (HPTC or High Performance Technical Computing is sometimes used.) This transition was largely due to the dissolving barrier between the traditional supercomputer and the cluster of commodity hardware. Typically, the cluster lowered the price-to-performance ratio by a factory of ten and reduced the cost-of-entry by at least ten times.