Network RAM revisited

Wed May 28 12:13:39 EDT 2003

> I am a student about to start work on using Network RAMs as swap space in a 
> cluster environment, as a part of semester project. I need to convince 

sigh.

> plausible (gigE becoming ubiquitous, network latency reducing, seek time in 
> disks not getting better etc). I would like to hear the opinion of the 

seek time improves slowly, it's true, but it does improve.
perhaps more importantly, striping makes it a non-issue, or at least
a solvable one.

network latency hasn't improved much, in spite of gbe: I see around
80 us /bin/ping latency for 100bT, and about half that for gbe.

in both cases, bandwidth has improved dramatically, though.  networks are 
still pretty pathetic compared to dram interfaces: in other words,
dram hasn't stood still, either.

> I guess the main argument against it is why not simply put in more memory 
> sticks and avoid swap altogether. I was told there are applications out 

why do you think swap is an issue?

> there that would still always need swap.

I don't believe that's true.  there are workloads which always result
in some dirty data which is mostly idle; those pages would be perfect
for swapping out, since it would free up the physical page for hotter 
use.  it would seem very strange to me if an app created a *lot* of 
these pages, since that would more-or-less imply an app design flaw.

> Another question that bothers me is network latency deteriorates severely 
> after packet size goes beyond 1-1.5 KB. 

I don't see that, unless by "severe" you mean latency=bandwidth/size ;)
fragmenting a packet should definitely not cause a big decrease in 
throughput.  also, support for jumbo MTU's is not that uncommon.

> Assuming page size is 4KB, wouldn't 
> this affect the network RAM performance in a big way? Any way around this 
> problem?

no.  MMU hardware is inimicably hostile to this sort of too-clever-ness.
not only are pages large, but manipulating them (TLB's actually) is
expensive, especially in a multiprocessor environment.

my take on swap is this:
	- a little bit of swapout is a very good thing, since it means
	that idle pages are being put to better use.

	- more than a trivial amount of swap *in* is very bad, since 
	it means someone is waiting on those pages.  worse is when 
	a page appears to be swapped out and back in quickly.  that's 
	really just a kernel bug.

	- swap-outs are also nice in that they are often async, so no
	one is waiting for them to complete.  they can also be lazy-written
	and even optimistically pre-written.

	- swap-ins are painful, but at least you can scale bandwidth
	and latency by adding spindles.

	- the ultimate solution is, of course, to add more ram!  for ia32,
	this is pretty iffy above 2-6 GB, partly because it's 32b hardware,
	and partly because the hardware has dampened demand for big memory.
	but ram is at historically low prices:
		http://sharcnet.mcmaster.ca/~hahn/ram.png
	(OK, there was a brief period when crappy old PC133 SDR was cheaper
	than PC2100 is today, but...)

in general, if you're concerned about ram, I'd look seriously at Opteron
machines, since there simply is no other platform that's quite as clean:
64b goodness, scales ram capacity with cpus, not crippled by a shared FSB.

it's true that you can put together some pretty big-ram ia64 machines,
but they tend to wind up being rather expensive ;)

in summary: I believe network shared memory is simply not a great computing
model.  if I was supervising a thesis project, I'd probably try to steer 
the student towards something vaguely like Linda...

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf