[Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud

Robert G. Brown rgb at phy.duke.edu
Tue Oct 4 14:39:20 EDT 2011

On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:

> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).. Those
> kinds of places go to thin clients and hosted VM instances eventually, I
> think.

Condor (much improved from the old days, I think) actually makes this
fairly easy nowadays.  The physics department runs condor across lots of
the low-rent desktop systems, creating a readily available compute farm
for EP jobs.

I don't do much of that sort of thing any more, alas.  Mostly teaching,
working on dieharder when I can, and writing textbooks at a furious
pace.  I will have a complete first year physics textbook -- the world's
best, naturally;-) -- finished by the end of this semester (I'm within
about four and a half chapters of finished already, and writing at least
a chapter a week at this point).

After that is done, and two other books that are partly finished (three
if I get really inspired and try to finish the beowulf book) THEN I may
have time to do more actual computing.

> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer, or some "low priority cycles" for free, to get people
> hooked.  Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.

As I said, ultimately Amazon makes a profit.  That is, they provide the
cluster and some reasonable subset of cluster management in
infrastructure provisioning, where they have to a) recoup the cost of
the hardware, the infrastructure, and the management; b) make at LEAST
5-10% or better on the costs of all of this as profit, if not more like
40-50% or even 100% markup.  Usually retail is 100% markup, but Amazon
has scale efficiencies such that they can get by with less, whether or
not they "like" to.

So it ultimately comes down to whether or not you can provide similar
efficiencies in your own local environment.  Suppose it is a University.
You have $100,000 for a compute resource that you expect to use over
three years.

There is typically no indirect cost charged to capital equipment.
Often, but not always, housing, cooling, powering, and even managing the
hardware is "free" to the researcher, absorbed into the ongoing costs of
the server room and management staff already needed to run the
department LAN and servers.  Thus for your $100,000 you can buy (say)
100 dedicated function systems for $1000 each and everything else is
paid out of opportunity cost labor or University provisioning that
doesn't cost your grant anything -- out of that $100,000 (although of
course your indirect costs elsewhere partly subsidize it).  Even network
ports may be free, or may not be if you need a higher end "cluster"

If you rent from ANYBODY, you pay:

   * Slightly over 1/3 of the $100,000 up front for indirect costs.
Duke, for example, would be perfectly happy to charge your grant $1 for
every $2 that it pays out to a third party for cloud computing rental.
For that fee they do all of the bookkeeping, basically -- most is pure
profit, but prenegotiated with all of the granting agencies and that's
just the way it is.

   * Your remaining (say) $63,000 has to pay for (a fraction of) the
power, the housing, the cooling, the network.  Unless Amazon subsidizes
the cluster with different money altogether (e.g. using money from book
sales to provide all of this at a loss) it will almost certainly not be
as cheap as a University center for modest size clusters.  When clusters
grow to where people have to build new data centers just to house them,
of course, this may not be true (but Amazon still doesn't gain much of a
relative advantage even in this extreme case, not in the long run).
Infrastructure costs are likely ballpark 10% of the cost of the hardware
you are running on.

   * It has to pay for Amazon's sysadmins and management and security.
These are humans that your money DIRECTLY supports, not humans that are
directly supported to do something else and do admin for you on an
opportunity cost basis "for free".  Real salaries, (fractionally) paid
from this income stream only.  Even amortized in the friendliest most
favorable way possible, admin cost are probably at least 10% of the
hardware costs.

   * Profit.  At least (say) $6300 is profit.  Nobody makes a similar
profit in the case of the DIY cluster.

   * The amortized cost of the hardware.

The way I see it, you end up with roughly 50% of every dollar lost >>off
the top<< of your $100,000.  You ultimately buy (an amortized fraction
of) the hardware the $100,000 as up-front capital equipment would cost
you, and instead of being able to leverage pre-existing University
infrastructure, avoid indirect costs, all as on a non-profit basis, you
have to pay for infrastructure, indirect costs on the grant, management,
AND A PROFIT on top of the hardware.

The only real advantage is that -- maybe -- Amazon has market leverage
and economy of scale on the hardware.  But 50%?  That's hard to make


