From deadline at eadline.org  Mon Oct  3 08:25:06 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Mon, 3 Oct 2011 08:25:06 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <20110921110239.GR25711@leitl.org>
References: <20110921110239.GR25711@leitl.org>
Message-ID: <59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>

Interesting and pragmatic HPC cloud presentation, worth watching
(25 minutes)

 http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/

--
Doug

>
> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>
> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>
> By Jon Brodkin | Published September 20, 2011 10:49 AM
>
> Amazon EC2 and other cloud services are expanding the market for
> high-performance computing. Without access to a national lab or a
> supercomputer in your own data center, cloud computing lets businesses
> spin
> up temporary clusters at will and stop paying for them as soon as the
> computing needs are met.
>
> A vendor called Cycle Computing is on a mission to demonstrate the
> potential
> of Amazon???s cloud by building increasingly large clusters on the Elastic
> Compute Cloud. Even with Amazon, building a cluster takes some work, but
> Cycle combines several technologies to ease the process and recently used
> them to create a 30,000-core cluster running CentOS Linux.
>
> The cluster, announced publicly this week, was created for an unnamed
> ???Top 5
> Pharma??? customer, and ran for about seven hours at the end of July at a
> peak
> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
> The details are impressive: 3,809 compute instances, each with eight cores
> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
> 256-bit
> AES encryption, and the cluster ran across data centers in three Amazon
> regions in the United States and Europe. The cluster was dubbed
> ???Nekomata.???
>
> Spreading the cluster across multiple continents was done partly for
> disaster
> recovery purposes, and also to guarantee that 30,000 cores could be
> provisioned. ???We thought it would improve our probability of success if
> we
> spread it out,??? Cycle Computing???s Dave Powers, manager of product
> engineering, told Ars. ???Nobody really knows how many instances you can
> get at
> any one time from any one [Amazon] region.???
>
> Amazon offers its own special cluster compute instances, at a higher cost
> than regular-sized virtual machines. These cluster instances provide 10
> Gigabit Ethernet networking along with greater CPU and memory, but they
> weren???t necessary to build the Cycle Computing cluster.
>
> The pharmaceutical company???s job, related to molecular modeling, was
> ???embarrassingly parallel??? so a fast interconnect wasn???t crucial. To
> further
> reduce costs, Cycle took advantage of Amazon???s low-price ???spot
> instances.??? To
> manage the cluster, Cycle Computing used its own management software as
> well
> as the Condor High-Throughput Computing software and Chef, an open source
> systems integration framework.
>
> Cycle demonstrated the power of the Amazon cloud earlier this year with a
> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
> 10,000 cores is a relatively easy task, says Powers. ???We think we???ve
> mastered
> the small-scale environments,??? he said. 30,000 cores isn???t the end
> game,
> either. Going forward, Cycle plans bigger, more complicated clusters,
> perhaps
> ones that will require Amazon???s special cluster compute instances.
>
> The 30,000-core cluster may or may not be the biggest one run on EC2.
> Amazon
> isn???t saying.
>
> ???I can???t share specific customer details, but can tell you that we do
> have
> businesses of all sizes running large-scale, high-performance computing
> workloads on AWS [Amazon Web Services], including distributed clusters
> like
> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
> used for science and engineering applications such as computational fluid
> dynamics and molecular dynamics simulation,??? an Amazon spokesperson told
> Ars.
>
> Amazon itself actually built a supercomputer on its own cloud that made it
> onto the list of the world???s Top 500 supercomputers. With 7,000 cores,
> the
> Amazon cluster ranked number 232 in the world last November with speeds of
> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
> Computing hasn???t run the Linpack benchmark to determine the speed of its
> clusters relative to Top 500 sites.
>
> But Cycle???s work is impressive no matter how you measure it. The job
> performed for the unnamed pharma company ???would take well over a week
> for
> them to run internally,??? Powers says. In the end, the cluster performed
> the
> equivalent of 10.9 ???compute years of work.???
>
> The task of managing such large cloud-based clusters forced Cycle to step
> up
> its own game, with a new plug-in for Chef the company calls Grill.
>
> ???There is no way that any mere human could keep track of all of the
> moving
> parts on a cluster of this scale,??? Cycle wrote in a blog post. ???At
> Cycle,
> we???ve always been fans of extreme IT automation, but we needed to take
> this
> to the next level in order to monitor and manage every instance, volume,
> daemon, job, and so on in order for Nekomata to be an efficient 30,000
> core
> tool instead of a big shiny on-demand paperweight.???
>
> But problems did arise during the 30,000-core run.
>
> ???You can be sure that when you run at massive scale, you are bound to
> run
> into some unexpected gotchas,??? Cycle notes. ???In our case, one of the
> gotchas
> included such things as running out of file descriptors on the license
> server. In hindsight, we should have anticipated this would be an issue,
> but
> we didn???t find that in our prelaunch testing, because we didn???t test
> at full
> scale. We were able to quickly recover from this bump and keep moving
> along
> with the workload with minimal impact. The license server was able to keep
> up
> very nicely with this workload once we increased the number of file
> descriptors.???
>
> Cycle also hit a speed bump related to volume and byte limits on
> Amazon???s
> Elastic Block Store volumes. But the company is already planning bigger
> and
> better things.
>
> ???We already have our next use-case identified and will be turning up the
> scale a bit more with the next run,??? the company says. But ultimately,
> ???it???s
> not about core counts or terabytes of RAM or petabytes of data. Rather,
> it???s
> about how we are helping to transform how science is done.???
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From prentice at ias.edu  Mon Oct  3 13:51:06 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 03 Oct 2011 13:51:06 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>
References: <20110921110239.GR25711@leitl.org>
	<59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>
Message-ID: <4E89F60A.4070801@ias.edu>

Doug,

Thanks for posting that video. It confirmed what I always suspected
about clouds for HPC.


Prentice

On 10/03/2011 08:25 AM, Douglas Eadline wrote:
> Interesting and pragmatic HPC cloud presentation, worth watching
> (25 minutes)
> 
>  http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/
> 
> --
> Doug
> 
>>
>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>
>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>
>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>
>> Amazon EC2 and other cloud services are expanding the market for
>> high-performance computing. Without access to a national lab or a
>> supercomputer in your own data center, cloud computing lets businesses
>> spin
>> up temporary clusters at will and stop paying for them as soon as the
>> computing needs are met.
>>
>> A vendor called Cycle Computing is on a mission to demonstrate the
>> potential
>> of Amazon???s cloud by building increasingly large clusters on the Elastic
>> Compute Cloud. Even with Amazon, building a cluster takes some work, but
>> Cycle combines several technologies to ease the process and recently used
>> them to create a 30,000-core cluster running CentOS Linux.
>>
>> The cluster, announced publicly this week, was created for an unnamed
>> ???Top 5
>> Pharma??? customer, and ran for about seven hours at the end of July at a
>> peak
>> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
>> The details are impressive: 3,809 compute instances, each with eight cores
>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
>> 256-bit
>> AES encryption, and the cluster ran across data centers in three Amazon
>> regions in the United States and Europe. The cluster was dubbed
>> ???Nekomata.???
>>
>> Spreading the cluster across multiple continents was done partly for
>> disaster
>> recovery purposes, and also to guarantee that 30,000 cores could be
>> provisioned. ???We thought it would improve our probability of success if
>> we
>> spread it out,??? Cycle Computing???s Dave Powers, manager of product
>> engineering, told Ars. ???Nobody really knows how many instances you can
>> get at
>> any one time from any one [Amazon] region.???
>>
>> Amazon offers its own special cluster compute instances, at a higher cost
>> than regular-sized virtual machines. These cluster instances provide 10
>> Gigabit Ethernet networking along with greater CPU and memory, but they
>> weren???t necessary to build the Cycle Computing cluster.
>>
>> The pharmaceutical company???s job, related to molecular modeling, was
>> ???embarrassingly parallel??? so a fast interconnect wasn???t crucial. To
>> further
>> reduce costs, Cycle took advantage of Amazon???s low-price ???spot
>> instances.??? To
>> manage the cluster, Cycle Computing used its own management software as
>> well
>> as the Condor High-Throughput Computing software and Chef, an open source
>> systems integration framework.
>>
>> Cycle demonstrated the power of the Amazon cloud earlier this year with a
>> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
>> 10,000 cores is a relatively easy task, says Powers. ???We think we???ve
>> mastered
>> the small-scale environments,??? he said. 30,000 cores isn???t the end
>> game,
>> either. Going forward, Cycle plans bigger, more complicated clusters,
>> perhaps
>> ones that will require Amazon???s special cluster compute instances.
>>
>> The 30,000-core cluster may or may not be the biggest one run on EC2.
>> Amazon
>> isn???t saying.
>>
>> ???I can???t share specific customer details, but can tell you that we do
>> have
>> businesses of all sizes running large-scale, high-performance computing
>> workloads on AWS [Amazon Web Services], including distributed clusters
>> like
>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
>> used for science and engineering applications such as computational fluid
>> dynamics and molecular dynamics simulation,??? an Amazon spokesperson told
>> Ars.
>>
>> Amazon itself actually built a supercomputer on its own cloud that made it
>> onto the list of the world???s Top 500 supercomputers. With 7,000 cores,
>> the
>> Amazon cluster ranked number 232 in the world last November with speeds of
>> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
>> Computing hasn???t run the Linpack benchmark to determine the speed of its
>> clusters relative to Top 500 sites.
>>
>> But Cycle???s work is impressive no matter how you measure it. The job
>> performed for the unnamed pharma company ???would take well over a week
>> for
>> them to run internally,??? Powers says. In the end, the cluster performed
>> the
>> equivalent of 10.9 ???compute years of work.???
>>
>> The task of managing such large cloud-based clusters forced Cycle to step
>> up
>> its own game, with a new plug-in for Chef the company calls Grill.
>>
>> ???There is no way that any mere human could keep track of all of the
>> moving
>> parts on a cluster of this scale,??? Cycle wrote in a blog post. ???At
>> Cycle,
>> we???ve always been fans of extreme IT automation, but we needed to take
>> this
>> to the next level in order to monitor and manage every instance, volume,
>> daemon, job, and so on in order for Nekomata to be an efficient 30,000
>> core
>> tool instead of a big shiny on-demand paperweight.???
>>
>> But problems did arise during the 30,000-core run.
>>
>> ???You can be sure that when you run at massive scale, you are bound to
>> run
>> into some unexpected gotchas,??? Cycle notes. ???In our case, one of the
>> gotchas
>> included such things as running out of file descriptors on the license
>> server. In hindsight, we should have anticipated this would be an issue,
>> but
>> we didn???t find that in our prelaunch testing, because we didn???t test
>> at full
>> scale. We were able to quickly recover from this bump and keep moving
>> along
>> with the workload with minimal impact. The license server was able to keep
>> up
>> very nicely with this workload once we increased the number of file
>> descriptors.???
>>
>> Cycle also hit a speed bump related to volume and byte limits on
>> Amazon???s
>> Elastic Block Store volumes. But the company is already planning bigger
>> and
>> better things.
>>
>> ???We already have our next use-case identified and will be turning up the
>> scale a bit more with the next run,??? the company says. But ultimately,
>> ???it???s
>> not about core counts or terabytes of RAM or petabytes of data. Rather,
>> it???s
>> about how we are helping to transform how science is done.???
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
> 
> 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From deadline at eadline.org  Mon Oct  3 14:17:33 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Mon, 3 Oct 2011 14:17:33 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <4E89F60A.4070801@ias.edu>
References: <20110921110239.GR25711@leitl.org>
	<59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>
	<4E89F60A.4070801@ias.edu>
Message-ID: <58756.192.168.93.213.1317665853.squirrel@mail.eadline.org>


I think everyone has a similar thoughts, but the presentation
provides some real data and experiences.

BTW, for those interested, I have new poll on ClusterMonkey asking
about clouds and HPC. (http://www.clustermonkey.net/)

The last poll was on GP-GPU use.

--
Doug


> Doug,
>
> Thanks for posting that video. It confirmed what I always suspected
> about clouds for HPC.
>
>
> Prentice
>
> On 10/03/2011 08:25 AM, Douglas Eadline wrote:
>> Interesting and pragmatic HPC cloud presentation, worth watching
>> (25 minutes)
>>
>>  http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/
>>
>> --
>> Doug
>>
>>>
>>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>>
>>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>>
>>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>>
>>> Amazon EC2 and other cloud services are expanding the market for
>>> high-performance computing. Without access to a national lab or a
>>> supercomputer in your own data center, cloud computing lets businesses
>>> spin
>>> up temporary clusters at will and stop paying for them as soon as the
>>> computing needs are met.
>>>
>>> A vendor called Cycle Computing is on a mission to demonstrate the
>>> potential
>>> of Amazon???s cloud by building increasingly large clusters on the
>>> Elastic
>>> Compute Cloud. Even with Amazon, building a cluster takes some work,
>>> but
>>> Cycle combines several technologies to ease the process and recently
>>> used
>>> them to create a 30,000-core cluster running CentOS Linux.
>>>
>>> The cluster, announced publicly this week, was created for an unnamed
>>> ???Top 5
>>> Pharma??? customer, and ran for about seven hours at the end of July at
>>> a
>>> peak
>>> cost of $1,279 per hour, including the fees to Amazon and Cycle
>>> Computing.
>>> The details are impressive: 3,809 compute instances, each with eight
>>> cores
>>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
>>> 256-bit
>>> AES encryption, and the cluster ran across data centers in three Amazon
>>> regions in the United States and Europe. The cluster was dubbed
>>> ???Nekomata.???
>>>
>>> Spreading the cluster across multiple continents was done partly for
>>> disaster
>>> recovery purposes, and also to guarantee that 30,000 cores could be
>>> provisioned. ???We thought it would improve our probability of success
>>> if
>>> we
>>> spread it out,??? Cycle Computing???s Dave Powers, manager of product
>>> engineering, told Ars. ???Nobody really knows how many instances you
>>> can
>>> get at
>>> any one time from any one [Amazon] region.???
>>>
>>> Amazon offers its own special cluster compute instances, at a higher
>>> cost
>>> than regular-sized virtual machines. These cluster instances provide 10
>>> Gigabit Ethernet networking along with greater CPU and memory, but they
>>> weren???t necessary to build the Cycle Computing cluster.
>>>
>>> The pharmaceutical company???s job, related to molecular modeling, was
>>> ???embarrassingly parallel??? so a fast interconnect wasn???t crucial.
>>> To
>>> further
>>> reduce costs, Cycle took advantage of Amazon???s low-price ???spot
>>> instances.??? To
>>> manage the cluster, Cycle Computing used its own management software as
>>> well
>>> as the Condor High-Throughput Computing software and Chef, an open
>>> source
>>> systems integration framework.
>>>
>>> Cycle demonstrated the power of the Amazon cloud earlier this year with
>>> a
>>> 10,000-core cluster built for a smaller pharma firm called Genentech.
>>> Now,
>>> 10,000 cores is a relatively easy task, says Powers. ???We think
>>> we???ve
>>> mastered
>>> the small-scale environments,??? he said. 30,000 cores isn???t the end
>>> game,
>>> either. Going forward, Cycle plans bigger, more complicated clusters,
>>> perhaps
>>> ones that will require Amazon???s special cluster compute instances.
>>>
>>> The 30,000-core cluster may or may not be the biggest one run on EC2.
>>> Amazon
>>> isn???t saying.
>>>
>>> ???I can???t share specific customer details, but can tell you that we
>>> do
>>> have
>>> businesses of all sizes running large-scale, high-performance computing
>>> workloads on AWS [Amazon Web Services], including distributed clusters
>>> like
>>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters
>>> often
>>> used for science and engineering applications such as computational
>>> fluid
>>> dynamics and molecular dynamics simulation,??? an Amazon spokesperson
>>> told
>>> Ars.
>>>
>>> Amazon itself actually built a supercomputer on its own cloud that made
>>> it
>>> onto the list of the world???s Top 500 supercomputers. With 7,000
>>> cores,
>>> the
>>> Amazon cluster ranked number 232 in the world last November with speeds
>>> of
>>> 41.82 teraflops, falling to number 451 in June of this year. So far,
>>> Cycle
>>> Computing hasn???t run the Linpack benchmark to determine the speed of
>>> its
>>> clusters relative to Top 500 sites.
>>>
>>> But Cycle???s work is impressive no matter how you measure it. The job
>>> performed for the unnamed pharma company ???would take well over a week
>>> for
>>> them to run internally,??? Powers says. In the end, the cluster
>>> performed
>>> the
>>> equivalent of 10.9 ???compute years of work.???
>>>
>>> The task of managing such large cloud-based clusters forced Cycle to
>>> step
>>> up
>>> its own game, with a new plug-in for Chef the company calls Grill.
>>>
>>> ???There is no way that any mere human could keep track of all of the
>>> moving
>>> parts on a cluster of this scale,??? Cycle wrote in a blog post. ???At
>>> Cycle,
>>> we???ve always been fans of extreme IT automation, but we needed to
>>> take
>>> this
>>> to the next level in order to monitor and manage every instance,
>>> volume,
>>> daemon, job, and so on in order for Nekomata to be an efficient 30,000
>>> core
>>> tool instead of a big shiny on-demand paperweight.???
>>>
>>> But problems did arise during the 30,000-core run.
>>>
>>> ???You can be sure that when you run at massive scale, you are bound to
>>> run
>>> into some unexpected gotchas,??? Cycle notes. ???In our case, one of
>>> the
>>> gotchas
>>> included such things as running out of file descriptors on the license
>>> server. In hindsight, we should have anticipated this would be an
>>> issue,
>>> but
>>> we didn???t find that in our prelaunch testing, because we didn???t
>>> test
>>> at full
>>> scale. We were able to quickly recover from this bump and keep moving
>>> along
>>> with the workload with minimal impact. The license server was able to
>>> keep
>>> up
>>> very nicely with this workload once we increased the number of file
>>> descriptors.???
>>>
>>> Cycle also hit a speed bump related to volume and byte limits on
>>> Amazon???s
>>> Elastic Block Store volumes. But the company is already planning bigger
>>> and
>>> better things.
>>>
>>> ???We already have our next use-case identified and will be turning up
>>> the
>>> scale a bit more with the next run,??? the company says. But
>>> ultimately,
>>> ???it???s
>>> not about core counts or terabytes of RAM or petabytes of data. Rather,
>>> it???s
>>> about how we are helping to transform how science is done.???
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean.
>>>
>>>
>>
>>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at gmail.com  Mon Oct  3 14:50:22 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Mon, 3 Oct 2011 14:50:22 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <20110921110239.GR25711@leitl.org>
References: <20110921110239.GR25711@leitl.org>
Message-ID: <CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>

There's a free & opensource application called StarCluster that can do
most (if not all?) of the EC2 provisioning & cluster setup for a High
Throughput Computing cluster:

http://web.mit.edu/stardev/cluster/

StarCluster sets up NFS, SGE, BLAS library, Open MPI, etc
automatically for the user in around 10-15 mins. StarCluster is
licensed under LGPL, written in Python+Boto, and supports a lot of the
new EC2 features (Cluster Compute Instances, Spot Instances, Cluster
GPU Instances, etc). Support for launching higher node count (100+
instances) clusters is even better with the new scalability
enhancements in the latest version (0.92).

And there are some tutorials on YouTube:

- "StarCluster 0.91 Demo":
http://www.youtube.com/watch?v=vC3lJcPq1FY

- "Launching a Cluster on Amazon Ec2 Spot Instances Using StarCluster":
http://www.youtube.com/watch?v=2Ym7epCYnSk

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


On Wed, Sep 21, 2011 at 7:02 AM, Eugen Leitl <eugen at leitl.org> wrote:
>
> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>
> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>
> By Jon Brodkin | Published September 20, 2011 10:49 AM
>
> Amazon EC2 and other cloud services are expanding the market for
> high-performance computing. Without access to a national lab or a
> supercomputer in your own data center, cloud computing lets businesses spin
> up temporary clusters at will and stop paying for them as soon as the
> computing needs are met.
>
> A vendor called Cycle Computing is on a mission to demonstrate the potential
> of Amazon?s cloud by building increasingly large clusters on the Elastic
> Compute Cloud. Even with Amazon, building a cluster takes some work, but
> Cycle combines several technologies to ease the process and recently used
> them to create a 30,000-core cluster running CentOS Linux.
>
> The cluster, announced publicly this week, was created for an unnamed ?Top 5
> Pharma? customer, and ran for about seven hours at the end of July at a peak
> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
> The details are impressive: 3,809 compute instances, each with eight cores
> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
> (petabytes) of disk space. Security was ensured with HTTPS, SSH and 256-bit
> AES encryption, and the cluster ran across data centers in three Amazon
> regions in the United States and Europe. The cluster was dubbed ?Nekomata.?
>
> Spreading the cluster across multiple continents was done partly for disaster
> recovery purposes, and also to guarantee that 30,000 cores could be
> provisioned. ?We thought it would improve our probability of success if we
> spread it out,? Cycle Computing?s Dave Powers, manager of product
> engineering, told Ars. ?Nobody really knows how many instances you can get at
> any one time from any one [Amazon] region.?
>
> Amazon offers its own special cluster compute instances, at a higher cost
> than regular-sized virtual machines. These cluster instances provide 10
> Gigabit Ethernet networking along with greater CPU and memory, but they
> weren?t necessary to build the Cycle Computing cluster.
>
> The pharmaceutical company?s job, related to molecular modeling, was
> ?embarrassingly parallel? so a fast interconnect wasn?t crucial. To further
> reduce costs, Cycle took advantage of Amazon?s low-price ?spot instances.? To
> manage the cluster, Cycle Computing used its own management software as well
> as the Condor High-Throughput Computing software and Chef, an open source
> systems integration framework.
>
> Cycle demonstrated the power of the Amazon cloud earlier this year with a
> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
> 10,000 cores is a relatively easy task, says Powers. ?We think we?ve mastered
> the small-scale environments,? he said. 30,000 cores isn?t the end game,
> either. Going forward, Cycle plans bigger, more complicated clusters, perhaps
> ones that will require Amazon?s special cluster compute instances.
>
> The 30,000-core cluster may or may not be the biggest one run on EC2. Amazon
> isn?t saying.
>
> ?I can?t share specific customer details, but can tell you that we do have
> businesses of all sizes running large-scale, high-performance computing
> workloads on AWS [Amazon Web Services], including distributed clusters like
> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
> used for science and engineering applications such as computational fluid
> dynamics and molecular dynamics simulation,? an Amazon spokesperson told Ars.
>
> Amazon itself actually built a supercomputer on its own cloud that made it
> onto the list of the world?s Top 500 supercomputers. With 7,000 cores, the
> Amazon cluster ranked number 232 in the world last November with speeds of
> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
> Computing hasn?t run the Linpack benchmark to determine the speed of its
> clusters relative to Top 500 sites.
>
> But Cycle?s work is impressive no matter how you measure it. The job
> performed for the unnamed pharma company ?would take well over a week for
> them to run internally,? Powers says. In the end, the cluster performed the
> equivalent of 10.9 ?compute years of work.?
>
> The task of managing such large cloud-based clusters forced Cycle to step up
> its own game, with a new plug-in for Chef the company calls Grill.
>
> ?There is no way that any mere human could keep track of all of the moving
> parts on a cluster of this scale,? Cycle wrote in a blog post. ?At Cycle,
> we?ve always been fans of extreme IT automation, but we needed to take this
> to the next level in order to monitor and manage every instance, volume,
> daemon, job, and so on in order for Nekomata to be an efficient 30,000 core
> tool instead of a big shiny on-demand paperweight.?
>
> But problems did arise during the 30,000-core run.
>
> ?You can be sure that when you run at massive scale, you are bound to run
> into some unexpected gotchas,? Cycle notes. ?In our case, one of the gotchas
> included such things as running out of file descriptors on the license
> server. In hindsight, we should have anticipated this would be an issue, but
> we didn?t find that in our prelaunch testing, because we didn?t test at full
> scale. We were able to quickly recover from this bump and keep moving along
> with the workload with minimal impact. The license server was able to keep up
> very nicely with this workload once we increased the number of file
> descriptors.?
>
> Cycle also hit a speed bump related to volume and byte limits on Amazon?s
> Elastic Block Store volumes. But the company is already planning bigger and
> better things.
>
> ?We already have our next use-case identified and will be turning up the
> scale a bit more with the next run,? the company says. But ultimately, ?it?s
> not about core counts or terabytes of RAM or petabytes of data. Rather, it?s
> about how we are helping to transform how science is done.?
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Raysonho
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Mon Oct  3 15:21:44 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 3 Oct 2011 15:21:44 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>
References: <20110921110239.GR25711@leitl.org>
	<CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1110031513541.7625@lilith>

On Mon, 3 Oct 2011, Rayson Ho wrote:

> There's a free & opensource application called StarCluster that can do
> most (if not all?) of the EC2 provisioning & cluster setup for a High
> Throughput Computing cluster:

I will say that if anyone is going to make this work, it is going to be
Amazon and/or Google -- they have the very very big pile of computers
needed to make it work.  I would be very interested in seeing the
detailed scaling of "fine grained parallel" applications on cloud
resources -- one point that the talk made that I agree with is that
embarrassingly parallel applications that require minimal I/O or IPCs
will do well in a cloud where all that matters is how many instances you
can run of jobs that don't talk to each other or need much access to
data.  But what of jobs that require synchronous high speed
communications?  What of jobs that require access to huge datasets?

Ultimately the problem comes down to this.  Your choice is to rent time
on somebody else's hardware or buy your own hardware.  For many people,
one can scale to infinity and beyond, so using "all" of the
time/resource you have available either way is a given.  In which case
no matter how you slice it, Amazon or Google have to make a profit above
and beyond the cost of delivering the service.  You don't (or rather,
your "profit" is just the ability to run your jobs and get paid as usual
to do your research either way).  This means that it will always be
cheaper to directly provision a lot of computing rather than run it in
the cloud, or for that matter at an HPC center.  Not all -- lots of
nonlinearities and thresholds associated with infrastructure and admin
and so on -- but a lot.  Enough that I don't see Amazon's Pinky OR the
Brain ever taking over the (HPC) world...

    rgb

>
> http://web.mit.edu/stardev/cluster/
>
> StarCluster sets up NFS, SGE, BLAS library, Open MPI, etc
> automatically for the user in around 10-15 mins. StarCluster is
> licensed under LGPL, written in Python+Boto, and supports a lot of the
> new EC2 features (Cluster Compute Instances, Spot Instances, Cluster
> GPU Instances, etc). Support for launching higher node count (100+
> instances) clusters is even better with the new scalability
> enhancements in the latest version (0.92).
>
> And there are some tutorials on YouTube:
>
> - "StarCluster 0.91 Demo":
> http://www.youtube.com/watch?v=vC3lJcPq1FY
>
> - "Launching a Cluster on Amazon Ec2 Spot Instances Using StarCluster":
> http://www.youtube.com/watch?v=2Ym7epCYnSk
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
>
>
> On Wed, Sep 21, 2011 at 7:02 AM, Eugen Leitl <eugen at leitl.org> wrote:
>>
>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>
>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>
>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>
>> Amazon EC2 and other cloud services are expanding the market for
>> high-performance computing. Without access to a national lab or a
>> supercomputer in your own data center, cloud computing lets businesses spin
>> up temporary clusters at will and stop paying for them as soon as the
>> computing needs are met.
>>
>> A vendor called Cycle Computing is on a mission to demonstrate the potential
>> of Amazon?s cloud by building increasingly large clusters on the Elastic
>> Compute Cloud. Even with Amazon, building a cluster takes some work, but
>> Cycle combines several technologies to ease the process and recently used
>> them to create a 30,000-core cluster running CentOS Linux.
>>
>> The cluster, announced publicly this week, was created for an unnamed ?Top 5
>> Pharma? customer, and ran for about seven hours at the end of July at a peak
>> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
>> The details are impressive: 3,809 compute instances, each with eight cores
>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and 256-bit
>> AES encryption, and the cluster ran across data centers in three Amazon
>> regions in the United States and Europe. The cluster was dubbed ?Nekomata.?
>>
>> Spreading the cluster across multiple continents was done partly for disaster
>> recovery purposes, and also to guarantee that 30,000 cores could be
>> provisioned. ?We thought it would improve our probability of success if we
>> spread it out,? Cycle Computing?s Dave Powers, manager of product
>> engineering, told Ars. ?Nobody really knows how many instances you can get at
>> any one time from any one [Amazon] region.?
>>
>> Amazon offers its own special cluster compute instances, at a higher cost
>> than regular-sized virtual machines. These cluster instances provide 10
>> Gigabit Ethernet networking along with greater CPU and memory, but they
>> weren?t necessary to build the Cycle Computing cluster.
>>
>> The pharmaceutical company?s job, related to molecular modeling, was
>> ?embarrassingly parallel? so a fast interconnect wasn?t crucial. To further
>> reduce costs, Cycle took advantage of Amazon?s low-price ?spot instances.? To
>> manage the cluster, Cycle Computing used its own management software as well
>> as the Condor High-Throughput Computing software and Chef, an open source
>> systems integration framework.
>>
>> Cycle demonstrated the power of the Amazon cloud earlier this year with a
>> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
>> 10,000 cores is a relatively easy task, says Powers. ?We think we?ve mastered
>> the small-scale environments,? he said. 30,000 cores isn?t the end game,
>> either. Going forward, Cycle plans bigger, more complicated clusters, perhaps
>> ones that will require Amazon?s special cluster compute instances.
>>
>> The 30,000-core cluster may or may not be the biggest one run on EC2. Amazon
>> isn?t saying.
>>
>> ?I can?t share specific customer details, but can tell you that we do have
>> businesses of all sizes running large-scale, high-performance computing
>> workloads on AWS [Amazon Web Services], including distributed clusters like
>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
>> used for science and engineering applications such as computational fluid
>> dynamics and molecular dynamics simulation,? an Amazon spokesperson told Ars.
>>
>> Amazon itself actually built a supercomputer on its own cloud that made it
>> onto the list of the world?s Top 500 supercomputers. With 7,000 cores, the
>> Amazon cluster ranked number 232 in the world last November with speeds of
>> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
>> Computing hasn?t run the Linpack benchmark to determine the speed of its
>> clusters relative to Top 500 sites.
>>
>> But Cycle?s work is impressive no matter how you measure it. The job
>> performed for the unnamed pharma company ?would take well over a week for
>> them to run internally,? Powers says. In the end, the cluster performed the
>> equivalent of 10.9 ?compute years of work.?
>>
>> The task of managing such large cloud-based clusters forced Cycle to step up
>> its own game, with a new plug-in for Chef the company calls Grill.
>>
>> ?There is no way that any mere human could keep track of all of the moving
>> parts on a cluster of this scale,? Cycle wrote in a blog post. ?At Cycle,
>> we?ve always been fans of extreme IT automation, but we needed to take this
>> to the next level in order to monitor and manage every instance, volume,
>> daemon, job, and so on in order for Nekomata to be an efficient 30,000 core
>> tool instead of a big shiny on-demand paperweight.?
>>
>> But problems did arise during the 30,000-core run.
>>
>> ?You can be sure that when you run at massive scale, you are bound to run
>> into some unexpected gotchas,? Cycle notes. ?In our case, one of the gotchas
>> included such things as running out of file descriptors on the license
>> server. In hindsight, we should have anticipated this would be an issue, but
>> we didn?t find that in our prelaunch testing, because we didn?t test at full
>> scale. We were able to quickly recover from this bump and keep moving along
>> with the workload with minimal impact. The license server was able to keep up
>> very nicely with this workload once we increased the number of file
>> descriptors.?
>>
>> Cycle also hit a speed bump related to volume and byte limits on Amazon?s
>> Elastic Block Store volumes. But the company is already planning bigger and
>> better things.
>>
>> ?We already have our next use-case identified and will be turning up the
>> scale a bit more with the next run,? the company says. But ultimately, ?it?s
>> not about core counts or terabytes of RAM or petabytes of data. Rather, it?s
>> about how we are helping to transform how science is done.?
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
>
> -- 
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Wikimedia Commons
> http://commons.wikimedia.org/wiki/User:Raysonho
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From raysonlogin at gmail.com  Tue Oct  4 10:55:39 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 4 Oct 2011 10:55:39 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110031513541.7625@lilith>
References: <20110921110239.GR25711@leitl.org>
	<CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>
	<alpine.LFD.2.02.1110031513541.7625@lilith>
Message-ID: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>

On Mon, Oct 3, 2011 at 3:21 PM, Robert G. Brown <rgb at phy.duke.edu> wrote:
>?I would be very interested in seeing the
> detailed scaling of "fine grained parallel" applications on cloud
> resources -- one point that the talk made that I agree with is that
> embarrassingly parallel applications that require minimal I/O or IPCs
> will do well in a cloud where all that matters is how many instances you
> can run of jobs that don't talk to each other or need much access to
> data. ?But what of jobs that require synchronous high speed
> communications?

Amazon (and I believe other cloud providers have something similar?)
introduced Cluster Compute Instances with 10 Gb Ethernet. For
traditional MPI workloads, the real advantage is actually from HVM
(Hardware VM), as it cuts the communication latency by quite a lot.


> What of jobs that require access to huge datasets?

Getting data in & out of the cloud is still a big problem, and the
highest bandwidth way of sending data to AWS is by FedEx. In fact, it
is quite often that the fastest way to send data from one data center
to another when the data size is big.


And processing data on the cloud is easier (in terms of setup) with
Amazon Elastic MapReduce (and recently works with spot instances).

http://aws.amazon.com/elasticmapreduce/


> Ultimately the problem comes down to this. ?Your choice is to rent time
> on somebody else's hardware or buy your own hardware. ?For many people,
> one can scale to infinity and beyond, so using "all" of the
> time/resource you have available either way is a given. ?In which case
> no matter how you slice it, Amazon or Google have to make a profit above
> and beyond the cost of delivering the service. ?You don't (or rather,
> your "profit" is just the ability to run your jobs and get paid as usual
> to do your research either way). ?This means that it will always be
> cheaper to directly provision a lot of computing rather than run it in
> the cloud, or for that matter at an HPC center.

Provided that the machines are used 24x7. A lot of enterprise users do
not have enough work to load up the machines. Eg, I worked with a
client that has lots of data & numbers to crunch at night, and during
day time most of the machines are idle.

For traditional HPC centers, the batch queue length is almost never 0,
then agreed, cloud wouldn't help or even makes the problem worse.

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


> ?Not all -- lots of
> nonlinearities and thresholds associated with infrastructure and admin
> and so on -- but a lot. ?Enough that I don't see Amazon's Pinky OR the
> Brain ever taking over the (HPC) world...
>
> ? rgb
>
>>
>> http://web.mit.edu/stardev/cluster/
>>
>> StarCluster sets up NFS, SGE, BLAS library, Open MPI, etc
>> automatically for the user in around 10-15 mins. StarCluster is
>> licensed under LGPL, written in Python+Boto, and supports a lot of the
>> new EC2 features (Cluster Compute Instances, Spot Instances, Cluster
>> GPU Instances, etc). Support for launching higher node count (100+
>> instances) clusters is even better with the new scalability
>> enhancements in the latest version (0.92).
>>
>> And there are some tutorials on YouTube:
>>
>> - "StarCluster 0.91 Demo":
>> http://www.youtube.com/watch?v=vC3lJcPq1FY
>>
>> - "Launching a Cluster on Amazon Ec2 Spot Instances Using StarCluster":
>> http://www.youtube.com/watch?v=2Ym7epCYnSk
>>
>> Rayson
>>
>> =================================
>> Grid Engine / Open Grid Scheduler
>> http://gridscheduler.sourceforge.net
>>
>>
>>
>> On Wed, Sep 21, 2011 at 7:02 AM, Eugen Leitl <eugen at leitl.org> wrote:
>>>
>>>
>>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>>
>>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>>
>>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>>
>>> Amazon EC2 and other cloud services are expanding the market for
>>> high-performance computing. Without access to a national lab or a
>>> supercomputer in your own data center, cloud computing lets businesses
>>> spin
>>> up temporary clusters at will and stop paying for them as soon as the
>>> computing needs are met.
>>>
>>> A vendor called Cycle Computing is on a mission to demonstrate the
>>> potential
>>> of Amazon?s cloud by building increasingly large clusters on the Elastic
>>> Compute Cloud. Even with Amazon, building a cluster takes some work, but
>>> Cycle combines several technologies to ease the process and recently used
>>> them to create a 30,000-core cluster running CentOS Linux.
>>>
>>> The cluster, announced publicly this week, was created for an unnamed
>>> ?Top 5
>>> Pharma? customer, and ran for about seven hours at the end of July at a
>>> peak
>>> cost of $1,279 per hour, including the fees to Amazon and Cycle
>>> Computing.
>>> The details are impressive: 3,809 compute instances, each with eight
>>> cores
>>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
>>> 256-bit
>>> AES encryption, and the cluster ran across data centers in three Amazon
>>> regions in the United States and Europe. The cluster was dubbed
>>> ?Nekomata.?
>>>
>>> Spreading the cluster across multiple continents was done partly for
>>> disaster
>>> recovery purposes, and also to guarantee that 30,000 cores could be
>>> provisioned. ?We thought it would improve our probability of success if
>>> we
>>> spread it out,? Cycle Computing?s Dave Powers, manager of product
>>> engineering, told Ars. ?Nobody really knows how many instances you can
>>> get at
>>> any one time from any one [Amazon] region.?
>>>
>>> Amazon offers its own special cluster compute instances, at a higher cost
>>> than regular-sized virtual machines. These cluster instances provide 10
>>> Gigabit Ethernet networking along with greater CPU and memory, but they
>>> weren?t necessary to build the Cycle Computing cluster.
>>>
>>> The pharmaceutical company?s job, related to molecular modeling, was
>>> ?embarrassingly parallel? so a fast interconnect wasn?t crucial. To
>>> further
>>> reduce costs, Cycle took advantage of Amazon?s low-price ?spot
>>> instances.? To
>>> manage the cluster, Cycle Computing used its own management software as
>>> well
>>> as the Condor High-Throughput Computing software and Chef, an open source
>>> systems integration framework.
>>>
>>> Cycle demonstrated the power of the Amazon cloud earlier this year with a
>>> 10,000-core cluster built for a smaller pharma firm called Genentech.
>>> Now,
>>> 10,000 cores is a relatively easy task, says Powers. ?We think we?ve
>>> mastered
>>> the small-scale environments,? he said. 30,000 cores isn?t the end game,
>>> either. Going forward, Cycle plans bigger, more complicated clusters,
>>> perhaps
>>> ones that will require Amazon?s special cluster compute instances.
>>>
>>> The 30,000-core cluster may or may not be the biggest one run on EC2.
>>> Amazon
>>> isn?t saying.
>>>
>>> ?I can?t share specific customer details, but can tell you that we do
>>> have
>>> businesses of all sizes running large-scale, high-performance computing
>>> workloads on AWS [Amazon Web Services], including distributed clusters
>>> like
>>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
>>> used for science and engineering applications such as computational fluid
>>> dynamics and molecular dynamics simulation,? an Amazon spokesperson told
>>> Ars.
>>>
>>> Amazon itself actually built a supercomputer on its own cloud that made
>>> it
>>> onto the list of the world?s Top 500 supercomputers. With 7,000 cores,
>>> the
>>> Amazon cluster ranked number 232 in the world last November with speeds
>>> of
>>> 41.82 teraflops, falling to number 451 in June of this year. So far,
>>> Cycle
>>> Computing hasn?t run the Linpack benchmark to determine the speed of its
>>> clusters relative to Top 500 sites.
>>>
>>> But Cycle?s work is impressive no matter how you measure it. The job
>>> performed for the unnamed pharma company ?would take well over a week for
>>> them to run internally,? Powers says. In the end, the cluster performed
>>> the
>>> equivalent of 10.9 ?compute years of work.?
>>>
>>> The task of managing such large cloud-based clusters forced Cycle to step
>>> up
>>> its own game, with a new plug-in for Chef the company calls Grill.
>>>
>>> ?There is no way that any mere human could keep track of all of the
>>> moving
>>> parts on a cluster of this scale,? Cycle wrote in a blog post. ?At Cycle,
>>> we?ve always been fans of extreme IT automation, but we needed to take
>>> this
>>> to the next level in order to monitor and manage every instance, volume,
>>> daemon, job, and so on in order for Nekomata to be an efficient 30,000
>>> core
>>> tool instead of a big shiny on-demand paperweight.?
>>>
>>> But problems did arise during the 30,000-core run.
>>>
>>> ?You can be sure that when you run at massive scale, you are bound to run
>>> into some unexpected gotchas,? Cycle notes. ?In our case, one of the
>>> gotchas
>>> included such things as running out of file descriptors on the license
>>> server. In hindsight, we should have anticipated this would be an issue,
>>> but
>>> we didn?t find that in our prelaunch testing, because we didn?t test at
>>> full
>>> scale. We were able to quickly recover from this bump and keep moving
>>> along
>>> with the workload with minimal impact. The license server was able to
>>> keep up
>>> very nicely with this workload once we increased the number of file
>>> descriptors.?
>>>
>>> Cycle also hit a speed bump related to volume and byte limits on Amazon?s
>>> Elastic Block Store volumes. But the company is already planning bigger
>>> and
>>> better things.
>>>
>>> ?We already have our next use-case identified and will be turning up the
>>> scale a bit more with the next run,? the company says. But ultimately,
>>> ?it?s
>>> not about core counts or terabytes of RAM or petabytes of data. Rather,
>>> it?s
>>> about how we are helping to transform how science is done.?
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>>
>> --
>> Rayson
>>
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> Wikimedia Commons
>> http://commons.wikimedia.org/wiki/User:Raysonho
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> Robert G. Brown ? ? ? ? ? ? ? ? ? ? ? ?http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 ?Fax: 919-660-2525 ? ? email:rgb at phy.duke.edu
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Tue Oct  4 11:26:55 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Tue, 4 Oct 2011 08:26:55 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
Message-ID: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>


On 10/4/11 7:55 AM, "Rayson Ho" <raysonlogin at gmail.com> wrote:

>On Mon, Oct 3, 2011 at 3:21 PM, Robert G. Brown <rgb at phy.duke.edu> wrote:
>> I would be very interested in seeing the
>> detailed scaling of "fine grained parallel" applications on cloud
>> resources -- one point that the talk made that I agree with is that
>> embarrassingly parallel applications that require minimal I/O or IPCs
>> will do well in a cloud where all that matters is how many instances you
>> can run of jobs that don't talk to each other or need much access to
>> data.  But what of jobs that require synchronous high speed
>> communications?
>
>Amazon (and I believe other cloud providers have something similar?)
>introduced Cluster Compute Instances with 10 Gb Ethernet. For
>traditional MPI workloads, the real advantage is actually from HVM
>(Hardware VM), as it cuts the communication latency by quite a lot.
>
>
>> What of jobs that require access to huge datasets?
>
>Getting data in & out of the cloud is still a big problem, and the
>highest bandwidth way of sending data to AWS is by FedEx. In fact, it
>is quite often that the fastest way to send data from one data center
>to another when the data size is big.

The classic: nothing beats a station wagon full of tapes for bandwidth.
(today, it's minivan with terabyte hard drives, but that's the idea)
>
>
>
>> Ultimately the problem comes down to this.  Your choice is to rent time
>> on somebody else's hardware or buy your own hardware.  For many people,
>> one can scale to infinity and beyond, so using "all" of the
>> time/resource you have available either way is a given.  In which case
>> no matter how you slice it, Amazon or Google have to make a profit above
>> and beyond the cost of delivering the service.  You don't (or rather,
>> your "profit" is just the ability to run your jobs and get paid as usual
>> to do your research either way).  This means that it will always be
>> cheaper to directly provision a lot of computing rather than run it in
>> the cloud, or for that matter at an HPC center.
>
>Provided that the machines are used 24x7. A lot of enterprise users do
>not have enough work to load up the machines. Eg, I worked with a
>client that has lots of data & numbers to crunch at night, and during
>day time most of the machines are idle.

In a situation where you've got an existing application and data, and you
just want to crunch numbers, and you pay either cloud or in-house, then
you make the choice based on the incremental cost.

However, even at the smallest increment on a cloud/hosted scheme, you have
to pay from CPU second #1 (plus the fixed overhead of getting the job
ready to go).

If you have a cluster in house, there is likely a way to get a test job
run essentially for free (perhaps on an older non-production cluster).
That test job provides the performance data and preliminary results that
you use in preparing the proposal to get real money to pay for real
computation.

This has been my argument for personal clusters... There's no accounting
staff or administrative person watching over you to make sure you are
effectively using the capital investment, in the same sense that most
places don't care how much idle time there is on your desktop PC.  If
you've got an idea, and you're willing to put your own time (free?) into
it, using the box that happens to be in your office or lab, nobody cares
one way or another, as long as your primary job gets done.
Notwithstanding that there ARE places that do cycle harvesting from
desktop machines, but the management and sysadmin hassles are so extreme
(I've written software to DO such harvesting, in pre-Beowulf days).. Those
kinds of places go to thin clients and hosted VM instances eventually, I
think.


Where an Amazon could do themselves a favor (maybe they do this already)
is to provide a free downloadable version of their environment for your
own computer, or some "low priority cycles" for free, to get people
hooked.  Sort of like IBM providing computers for cheap to universities in
the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
cellphones, 10 cent text messages. Give us your child 'til 7, and he's
ours for life.


>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Tue Oct  4 11:58:12 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 4 Oct 2011 11:58:12 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
Message-ID: <CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>

On Tue, Oct 4, 2011 at 11:26 AM, Lux, Jim (337C)
<james.p.lux at jpl.nasa.gov> wrote:
> The classic: nothing beats a station wagon full of tapes for bandwidth.
> (today, it's minivan with terabyte hard drives, but that's the idea)

BTW, I've heard horror stories related to routing errors with this
method - truck drivers delivering wrong tapes or losing tapes
(hopefully the data is properly encrypted).


> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).

The technology part of cycle harvesting is solvable, the accounting
part is (IMO) much harder.

A few years ago I talked to a University HPC lab about deploying cycle
harvesting in the libraries (it's a big University, so we are talking
about 1000+ library desktops). The technology was there (BOINC
client), but getting the software installed & maintained means extra
work, which means an extra IT guy... and means no one wants to pay for
this.

I wonder how many University labs or Biotech companies are doing
organization wide cycle harvesting these days, for example, with
technologies like BOINC:

http://boinc.berkeley.edu/


> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer,

AMI is not private (in the end, it is IaaS, so the VM images are open).

In fact, StarCluster has AMIs for download & install (mainly for
developers who want to code for StarCluster locally):

http://web.mit.edu/stardev/cluster/download_amis.html


And one can roll a custom StarCluster AMI and upload it to AWS, such
that the image settings are optimized to the needs:

http://web.mit.edu/stardev/cluster/docs/0.91/create_new_ami.html


> or some "low priority cycles" for free, to get people hooked.

AWS Free Usage Tier -- (most people just use the free tier as free hosting):

http://aws.amazon.com/free/

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


> ?Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.
>
>
>>
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Tue Oct  4 13:08:11 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Tue, 4 Oct 2011 13:08:11 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
Message-ID: <53556.192.168.93.213.1317748091.squirrel@mail.eadline.org>


--snip--
>
> This has been my argument for personal clusters... There's no accounting
> staff or administrative person watching over you to make sure you are
> effectively using the capital investment, in the same sense that most
> places don't care how much idle time there is on your desktop PC.  If
> you've got an idea, and you're willing to put your own time (free?) into
> it, using the box that happens to be in your office or lab, nobody cares
> one way or another, as long as your primary job gets done.
> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).. Those
> kinds of places go to thin clients and hosted VM instances eventually, I
> think.

BTW, very soon  prebuilt Limulus systems will be available
(http://limulus.basement-supercomputing.com) with 16 cores
(four i5-2500S processors), one power plug, cool, quiet,
with cool blue lights to impress your co-workers.

--
Doug

>
>
> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer, or some "low priority cycles" for free, to get people
> hooked.  Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.
>
>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Oct  4 14:39:20 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 4 Oct 2011 14:39:20 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
Message-ID: <alpine.LFD.2.02.1110041414180.14257@lilith>

On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:

> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).. Those
> kinds of places go to thin clients and hosted VM instances eventually, I
> think.

Condor (much improved from the old days, I think) actually makes this
fairly easy nowadays.  The physics department runs condor across lots of
the low-rent desktop systems, creating a readily available compute farm
for EP jobs.

I don't do much of that sort of thing any more, alas.  Mostly teaching,
working on dieharder when I can, and writing textbooks at a furious
pace.  I will have a complete first year physics textbook -- the world's
best, naturally;-) -- finished by the end of this semester (I'm within
about four and a half chapters of finished already, and writing at least
a chapter a week at this point).

After that is done, and two other books that are partly finished (three
if I get really inspired and try to finish the beowulf book) THEN I may
have time to do more actual computing.

> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer, or some "low priority cycles" for free, to get people
> hooked.  Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.

As I said, ultimately Amazon makes a profit.  That is, they provide the
cluster and some reasonable subset of cluster management in
infrastructure provisioning, where they have to a) recoup the cost of
the hardware, the infrastructure, and the management; b) make at LEAST
5-10% or better on the costs of all of this as profit, if not more like
40-50% or even 100% markup.  Usually retail is 100% markup, but Amazon
has scale efficiencies such that they can get by with less, whether or
not they "like" to.

So it ultimately comes down to whether or not you can provide similar
efficiencies in your own local environment.  Suppose it is a University.
You have $100,000 for a compute resource that you expect to use over
three years.

There is typically no indirect cost charged to capital equipment.
Often, but not always, housing, cooling, powering, and even managing the
hardware is "free" to the researcher, absorbed into the ongoing costs of
the server room and management staff already needed to run the
department LAN and servers.  Thus for your $100,000 you can buy (say)
100 dedicated function systems for $1000 each and everything else is
paid out of opportunity cost labor or University provisioning that
doesn't cost your grant anything -- out of that $100,000 (although of
course your indirect costs elsewhere partly subsidize it).  Even network
ports may be free, or may not be if you need a higher end "cluster"
network.

If you rent from ANYBODY, you pay:

   * Slightly over 1/3 of the $100,000 up front for indirect costs.
Duke, for example, would be perfectly happy to charge your grant $1 for
every $2 that it pays out to a third party for cloud computing rental.
For that fee they do all of the bookkeeping, basically -- most is pure
profit, but prenegotiated with all of the granting agencies and that's
just the way it is.

   * Your remaining (say) $63,000 has to pay for (a fraction of) the
power, the housing, the cooling, the network.  Unless Amazon subsidizes
the cluster with different money altogether (e.g. using money from book
sales to provide all of this at a loss) it will almost certainly not be
as cheap as a University center for modest size clusters.  When clusters
grow to where people have to build new data centers just to house them,
of course, this may not be true (but Amazon still doesn't gain much of a
relative advantage even in this extreme case, not in the long run).
Infrastructure costs are likely ballpark 10% of the cost of the hardware
you are running on.

   * It has to pay for Amazon's sysadmins and management and security.
These are humans that your money DIRECTLY supports, not humans that are
directly supported to do something else and do admin for you on an
opportunity cost basis "for free".  Real salaries, (fractionally) paid
from this income stream only.  Even amortized in the friendliest most
favorable way possible, admin cost are probably at least 10% of the
hardware costs.

   * Profit.  At least (say) $6300 is profit.  Nobody makes a similar
profit in the case of the DIY cluster.

   * The amortized cost of the hardware.

The way I see it, you end up with roughly 50% of every dollar lost >>off
the top<< of your $100,000.  You ultimately buy (an amortized fraction
of) the hardware the $100,000 as up-front capital equipment would cost
you, and instead of being able to leverage pre-existing University
infrastructure, avoid indirect costs, all as on a non-profit basis, you
have to pay for infrastructure, indirect costs on the grant, management,
AND A PROFIT on top of the hardware.

The only real advantage is that -- maybe -- Amazon has market leverage
and economy of scale on the hardware.  But 50%?  That's hard to make
back.

    rgb

>
>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From dag at sonsorol.org  Tue Oct  4 15:29:28 2011
From: dag at sonsorol.org (Chris Dagdigian)
Date: Tue, 04 Oct 2011 15:29:28 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041414180.14257@lilith>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<alpine.LFD.2.02.1110041414180.14257@lilith>
Message-ID: <4E8B5E98.3090002@sonsorol.org>


I'm largely with RGB on this one with the minor caveat that I think he 
might be undervaluing the insane economies of scale that IaaS providers 
like Amazon & Google can provide.

At the scale that Amazon operates at, they can obtain and run 
infrastructure far, far more efficiently than most (if not all) of us 
can ourselves. These folks have exabytes of spinning disk, redundant 
data-centers (with insane PUE values) all over the world and they know 
how to manage hundreds of thousands of servers with high efficiency in a 
very hostile networking environment. Not only can they run bigger and 
more efficient than we can, they can charge a price that makes them a 
profit while still being (in many cases) far cheaper than my own costs 
should I be truly honest about the fully-loaded costs of maintaining HPC 
or IT services.

AWS has a history of lowering prices as their own costs go down. You can 
see this via the EC2 pricing history as well as the now-down-to-zero 
cost of inbound data transit.

AWS Spot market makes this even more interesting. I can currently run an 
m1.4xlarge 64bit server instance with 15GB RAM for about $.24 per hour - 
close to 50% cheaper than the published hourly price and that spot price 
can hold steady for weeks at a time in many cases.

The biggest hangup is the economics. Even harder in an academic 
environment where researchers are used to seeing their funds vanish to 
"overhead" on their grant or they just assume that datacenters, 
bandwidth, power and hosting are all "free" to use.

It's hard to do true cost comparisons but time and time again I've seen 
IaaS come out ahead when the fully-loaded costs are actually put down on 
paper.

Here is a cliche example: Amazon S3

Before the S3 object storage service will even *acknowledge* a 
successful PUT request, your file is already at rest in at least three 
amazon facilities.

So to "really" compare S3 against what you can do locally you at least 
have to factor in the cost of your organization being able to provide 3x 
multi-facility replication for whatever object store you choose to deploy...

I don't want to be seen as a shill so I'll stop with that example. The 
results really are surprising once you start down the "true cost of IT 
services..." road.


As for industry trends with HPC and IaaS ...

I can assure you that in the super practical & cynical world of biotech 
and pharma there is already an HPC migration to IaaS platforms that is 
years old already. It's a lot easier to see where and how your money is 
being spent inside a biotech startup or pharma and that is (and has) 
shunted a decent amount of spending towards cloud platforms.

The easy stuff is moving to IaaS platforms. The hard stuff, the custom 
stuff, the tightly bound stuff and the data/IO-bound stuff is staying 
local of course - but that still means lots of stuff is moving externally.

The article that prompted this thread is a great example of this. The 
client company had a boatload of one-off molecular dynamics simulations 
to run. So much, in fact, that the problem was computationally 
infeasable to even consider doing inhouse.

So they did it on AWS.

30,000 CPU cores. For ~$9,000 dollars.

Amazing.

It's a fun time to be in HPC actually. And getting my head around "IaaS" 
platforms turned me onto things (like opscode chef) that we are now 
bringing inhouse and integrating into our legacy clusters and grids.


Sorry for rambling but I think there are 2 main drivers behind what I 
see moving HPC users and applications into IaaS cloud platforms ...


(1) The economies of scale are real. IaaS providers can run better, 
bigger and cheaper than we can and they can still make a profit. This is 
real, not hype or sales BS. (as long as you are honest about your actual 
costs...)


(2) The benefits of "scriptable everything" or "everything has an API". 
I'm so freaking sick of companies installing VMWare and excreting a 
press release calling themselves a "cloud provider". Virtual servers and 
virtual block storage on demand are boring, basic and pedestrian. That 
was clever in 2004. I need far more "glue" to build useful stuff in a 
virtual world and IaaS platforms deliver more products/services and 
"glue" options than anyone else out there. The "scriptable everything" 
nature of IaaS is enabling a lot of cool system and workflow building, 
much of which would be hard or almost impossible to do in-house with 
local resources.


My $.02

-Chris

(corporate hat: chris at bioteam.net)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Tue Oct  4 16:07:21 2011
From: mathog at caltech.edu (mathog)
Date: Tue, 04 Oct 2011 13:07:21 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
	000-core cluster built on Amazon EC2 cloud
In-Reply-To: <mailman.1.1317754801.13041.beowulf@beowulf.org>
References: <mailman.1.1317754801.13041.beowulf@beowulf.org>
Message-ID: <d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>

> "Robert G. Brown" <rgb at phy.duke.edu> wrote:

> Often, but not always, housing, cooling, powering, and even managing 
> the
> hardware is "free" to the researcher, absorbed into the ongoing costs 
> of
> the server room and management staff already needed to run the
> department LAN and servers.

Not always indeed.  My little machine room houses a half dozen machines 
from other biology division
people, and they are not charged to keep them there.  However, putting 
a computer in the central
campus machine rooms is not free.  And new computer rooms, at least 
those of any size, do not
get free power.  After geology put in this monster:

   http://www.gps.caltech.edu/uploads/Image/Facilities/Beowulf.jpg

the administration decided that when a computer room pretty much needs 
its own
substation, it is well beyond the incidental overhead costs they are 
willing to
pick up for average research labs.

Along similar lines, I would guess that SLAC has to pay for its own 
power, rather than
Stanford covering it out of overhead.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Tue Oct  4 16:39:16 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 4 Oct 2011 16:39:16 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1110041611120.14257@lilith>

On Tue, 4 Oct 2011, Chi Chan wrote:

> On Tue, Oct 4, 2011 at 11:58 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
>> BTW, I've heard horror stories related to routing errors with this
>> method - truck drivers delivering wrong tapes or losing tapes
>> (hopefully the data is properly encrypted).
>
> I just read this on Slashdot today, it is "very hard to encrypt a
> backup tape" (really?):
>
> http://yro.slashdot.org/story/11/10/04/1815256/saic-loses-data-of-49-million-patients

Not if it is encrypted with a stream cipher -- a stream cipher basically
xors the data with a bitstream generated from a suitable key in a
cryptographic-strength pseudorandom number generator (although there are
variations on this theme).  As a result, it can be quite fast -- as fast
as generating pseudorandom numbers from the generator -- and it produces
a file that is exactly the size of the original message in length.

There are encryption schemes that expend extraordinary amounts of
computational energy in generating the stream, and there are also block
ciphers (which are indeed hard to implement for a streaming tape full of
data, as they usually don't work so well for long messages).  But in the
end no, it isn't that hard to encrypt a backup tape, provided that you
are willing to accept the limitation that the speed of
encrypting/decrypting the stream being written to the tape is basically
limited by the speed of your RNG (which may well be slower than the
speed of most fast networks).

    rgb

>
> --Chi
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Tue Oct  4 16:43:15 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Tue, 4 Oct 2011 13:43:15 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>
References: <mailman.1.1317754801.13041.beowulf@beowulf.org>
	<d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F010851503486@ALTPHYEMBEVSP20.RES.AD.JPL>

> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of mathog
> Sent: Tuesday, October 04, 2011 1:07 PM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud
> 
> > "Robert G. Brown" <rgb at phy.duke.edu> wrote:
> 
> > Often, but not always, housing, cooling, powering, and even managing
> > the
> > hardware is "free" to the researcher, absorbed into the ongoing costs
> > of
> > the server room and management staff already needed to run the
> > department LAN and servers.
> 
> Not always indeed.  My little machine room houses a half dozen machines
> from other biology division
> people, and they are not charged to keep them there.  However, putting
> a computer in the central
> campus machine rooms is not free.  And new computer rooms, at least
> those of any size, do not
> get free power.  After geology put in this monster:
> 
>    http://www.gps.caltech.edu/uploads/Image/Facilities/Beowulf.jpg
> 
http://citerra.gps.caltech.edu/wiki/Public/Technology
A mere 512 nodes, each with 8 cores.

670W power supply is standard, so let's say about 500 nodes at 700 watts each or 350kW...

HVAC will add on top of that, but I doubt they're loaded to the max.

Call it 400kW.. That's big, but not enormous.  (e.g you can rent a trailer mounted generator for that kind of power for about $1000/day.. the bigger generators one sees on a movie set might be 200-300kW))  CalTrans will only pay $123/hr for a 500kW generator (and fuel cost comes out of that)


But, if you were paying SoCalEdison for the juice..You'd be on (minimum) the TOU-GS-3 tariff.. On peak you'd be paying 0.02/kWh for delivery and 0.104/kWh for the power.  (off peak would be 0.045/kWh)

So call it 12c/kWh on peak.  At 400kW, that's $48/hr, which isn't bad, operating expenses wise.

Let's compare to the EC2.. $1300/hr for 30k cores. 23 core hours/$
            The CITerra is $50/hr for 4000 cores.  80 core hours/$

Yes, one had to go out and BUY all those cores for CITerra. $5000/node, all in, including cabling racks, etc.? What's that, about $1.25M.  Spread that out over 3 years at 2000 hrs/year (we only consider working in the daytime, etc. and you get about $210/hr for the capital cost (for all 500+ nodes..)


So, the EC2 seems like a good solution when you need rapid scalability to huge sizes and you have a big expense budget and a small capital budget.   You could call up Amazon this afternoon and run that 30,000 core job tonight.  And you'd pay substantially for that flexibility (which is how Amazon makes money, eh?)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jlb17 at duke.edu  Tue Oct  4 16:47:30 2011
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 4 Oct 2011 16:47:30 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041611120.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
Message-ID: <alpine.LRH.2.02.1110041644490.2124@hogwarts.bme.duke.edu>

On Tue, 4 Oct 2011 at 4:39pm, Robert G. Brown wrote

> On Tue, 4 Oct 2011, Chi Chan wrote:
>
>> On Tue, Oct 4, 2011 at 11:58 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
>>> BTW, I've heard horror stories related to routing errors with this
>>> method - truck drivers delivering wrong tapes or losing tapes
>>> (hopefully the data is properly encrypted).
>>
>> I just read this on Slashdot today, it is "very hard to encrypt a
>> backup tape" (really?):
>>
>> http://yro.slashdot.org/story/11/10/04/1815256/saic-loses-data-of-49-million-patients
>
> Not if it is encrypted with a stream cipher -- a stream cipher basically
> xors the data with a bitstream generated from a suitable key in a
> cryptographic-strength pseudorandom number generator (although there are
> variations on this theme).  As a result, it can be quite fast -- as fast
> as generating pseudorandom numbers from the generator -- and it produces
> a file that is exactly the size of the original message in length.

For added "no, it's not hard, they're apparently just not very bright" 
value, LTO4+ includes hardware AES encryption.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Tue Oct  4 16:48:00 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Tue, 4 Oct 2011 13:48:00 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041611120.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>


> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> Sent: Tuesday, October 04, 2011 1:39 PM
> To: Chi Chan
> Cc: Rayson Ho; Lux, Jim (337C); tt at postbiota.org; jtriley at mit.edu; Beowulf List
> Subject: Re: [Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud
> 
> On Tue, 4 Oct 2011, Chi Chan wrote:
> 
> > On Tue, Oct 4, 2011 at 11:58 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
> >> BTW, I've heard horror stories related to routing errors with this
> >> method - truck drivers delivering wrong tapes or losing tapes
> >> (hopefully the data is properly encrypted).
> >
> > I just read this on Slashdot today, it is "very hard to encrypt a
> > backup tape" (really?):
> >
> > http://yro.slashdot.org/story/11/10/04/1815256/saic-loses-data-of-49-million-patients
> 
> Not if it is encrypted with a stream cipher -- a stream cipher basically
> xors the data with a bitstream generated from a suitable key in a
> cryptographic-strength pseudorandom number generator (although there are
> variations on this theme).  As a result, it can be quite fast -- as fast
> as generating pseudorandom numbers from the generator -- and it produces
> a file that is exactly the size of the original message in length.
> 
> There are encryption schemes that expend extraordinary amounts of
> computational energy in generating the stream, and there are also block
> ciphers (which are indeed hard to implement for a streaming tape full of
> data, as they usually don't work so well for long messages).  But in the
> end no, it isn't that hard to encrypt a backup tape, provided that you
> are willing to accept the limitation that the speed of
> encrypting/decrypting the stream being written to the tape is basically
> limited by the speed of your RNG (which may well be slower than the
> speed of most fast networks).
> 

The reason it wasn't encrypted is almost certainly not because it was difficult to do so for technology reasons. When you see a story about "data being lost or stolen from a car" it's because it was an ad hoc situation. Someone got a copy of the data to do some sort of analysis or to take it somewhere on a onetime basis, and "things went wrong".

Any sort of regular process would normally deal with encryption or security as a matter of course: it's too easy to do it right.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Tue Oct  4 16:52:13 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Tue, 4 Oct 2011 13:52:13 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
	000-core cluster built on	Amazon EC2 cloud
In-Reply-To: <4E8B5E98.3090002@sonsorol.org>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<alpine.LFD.2.02.1110041414180.14257@lilith>
	<4E8B5E98.3090002@sonsorol.org>
Message-ID: <20111004205213.GD14057@bx9.net>

On Tue, Oct 04, 2011 at 03:29:28PM -0400, Chris Dagdigian wrote:

> I'm largely with RGB on this one with the minor caveat that I think he 
> might be undervaluing the insane economies of scale that IaaS providers 
> like Amazon & Google can provide.

You can rent that economy of scale if you're in the right part of the
country. We weren't surprised to recently learn that our Silicon
Valley datacenter rent is much lower than Moscow, but I was surprised
to learn that we pay 1/3 less here than in Vegas, which allegedly has
cheap land and power hence cheap datacenter rents. And with only 750
servers, we are already big enough to reap enough outright economy of
scale to make leasing our own servers in a rented datacenter cheaper
than renting everything from Amazon.

The unique thing Amazon is providing is the ability to grow and shrink
your cluster. Your example of a company which wanted to run a bunch of
molecular dynamics computations in a short period of time is an
illustration of that.

BTW, Amazon has lowered prices since AWS was released, but not by as
much as their costs have fallen. That's no surprise, given their
dominant role in that market.

-- greg
(corporate hat: infrastructure at a search engine)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Tue Oct  4 17:03:46 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 4 Oct 2011 17:03:46 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1110041649390.14257@lilith>

On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:

> 
> The reason it wasn't encrypted is almost certainly not because it
> was difficult to do so for technology reasons. When you see a story
> about "data being lost or stolen from a car" it's because it was an ad
> hoc situation. Someone got a copy of the data to do some sort of
> analysis or to take it somewhere on a onetime basis, and "things went
> wrong".
>
> Any sort of regular process would normally deal with encryption or
> security as a matter of course: it's too easy to do it right.

The problem being that HIPAA is not amused by incompetence.  The
standard is pretty much show due diligence or be prepared to pay massive
bucks out in lawsuits should the data you protect be compromised.  It is
really a most annoying standard -- I mean it is good that it is so
flexible and makes the responsibility clear, but for most of HIPAA's
existence it has provided no F***ing guidelines on how to make protected
data secure.

Consequently (and I say this as a modest consultant-level expert) your
data and mine in the Electronic Medical Record of your choice is
typically:

   a) Stored in flat, unencrypted plaintext or binary image in the base
DB.

   b) Transmitted in flat, unencrypted plaintext between the server and
any LAN-connected clients.  In other words, it assumes that your local
LAN is secure.

   c) Relies on third party e.g. VPN solutions to provide encryption for
use across a WAN.

Needless to say, the passwords and authentication schemes used in EMRs
are typically a joke -- after all, the users are borderline incompetent
users and cannot be expected to remember or quickly type in a user id or
password much more complicated than their own initials.  Many sites have
one completely trivial password in use by all the physicians and nurses
who use the system -- just enough to MAYBE keep patients out of the
system while waiting in an examining room.

I have had to convince the staff of at least one major EMR company that
I will refrain from naming that no, I wasn't going to ship them a copy
of an entire dataset exported from an old practice management system --
think of it as the names, addresses, SSNs and a few dozen other
"protected" pieces of personal information -- to them as an unencrypted
zip file over the internet, and had to finally grit my teeth and accept
the use of zip's (not terribly good) built in encryption and cross my
fingers and pray.

Do not underestimate the sheer power of incompetence, in other words,
especially incompetence in an environment almost completely lacking
meaningful IT-level standards or oversight.  It's really shameful,
actually -- it would be so very easy to build in nearly bulletproof
security schema that would make the need for third party VPNs passe.

I don't know that ALL of the EMRs out there are STILL this bad, but I'd
bet that 90% of them are.  They certainly were 3-4 years ago, last time
I looked in detail.

So this is just par for the course.  Doctors don't understand IT
security.  EMR creators should, but security is "expensive" and they
don't bother because it isn't mandated.  The end result is that
everything from the DB to the physician's working screen is so horribly
insecure that if any greed-driven cracker out there ever decided to
exclusively target the weaknesses, they could compromise HIPAA and SSNs
by the millions.

Sigh.

    rgb

> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Tue Oct  4 17:21:31 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Tue, 4 Oct 2011 17:21:31 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041649390.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
Message-ID: <44854.192.168.93.213.1317763291.squirrel@mail.eadline.org>


Several years ago I flippantly proposed what seems to be
a simple way to ensure important consumer private data
(medical, finance, etc.) was safe. Pass a law that says
organization who collects or holds personal data must
include the same data for organization's Board of Directors and
officers (CEO, COO etc) in the database. At least
the CEO might start taking security serious when
someone in Bulgaria is buying jet skies with his AMX card.

--
Doug


> On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:
>
>>
>> The reason it wasn't encrypted is almost certainly not because it
>> was difficult to do so for technology reasons. When you see a story
>> about "data being lost or stolen from a car" it's because it was an ad
>> hoc situation. Someone got a copy of the data to do some sort of
>> analysis or to take it somewhere on a onetime basis, and "things went
>> wrong".
>>
>> Any sort of regular process would normally deal with encryption or
>> security as a matter of course: it's too easy to do it right.
>
> The problem being that HIPAA is not amused by incompetence.  The
> standard is pretty much show due diligence or be prepared to pay massive
> bucks out in lawsuits should the data you protect be compromised.  It is
> really a most annoying standard -- I mean it is good that it is so
> flexible and makes the responsibility clear, but for most of HIPAA's
> existence it has provided no F***ing guidelines on how to make protected
> data secure.
>
> Consequently (and I say this as a modest consultant-level expert) your
> data and mine in the Electronic Medical Record of your choice is
> typically:
>
>    a) Stored in flat, unencrypted plaintext or binary image in the base
> DB.
>
>    b) Transmitted in flat, unencrypted plaintext between the server and
> any LAN-connected clients.  In other words, it assumes that your local
> LAN is secure.
>
>    c) Relies on third party e.g. VPN solutions to provide encryption for
> use across a WAN.
>
> Needless to say, the passwords and authentication schemes used in EMRs
> are typically a joke -- after all, the users are borderline incompetent
> users and cannot be expected to remember or quickly type in a user id or
> password much more complicated than their own initials.  Many sites have
> one completely trivial password in use by all the physicians and nurses
> who use the system -- just enough to MAYBE keep patients out of the
> system while waiting in an examining room.
>
> I have had to convince the staff of at least one major EMR company that
> I will refrain from naming that no, I wasn't going to ship them a copy
> of an entire dataset exported from an old practice management system --
> think of it as the names, addresses, SSNs and a few dozen other
> "protected" pieces of personal information -- to them as an unencrypted
> zip file over the internet, and had to finally grit my teeth and accept
> the use of zip's (not terribly good) built in encryption and cross my
> fingers and pray.
>
> Do not underestimate the sheer power of incompetence, in other words,
> especially incompetence in an environment almost completely lacking
> meaningful IT-level standards or oversight.  It's really shameful,
> actually -- it would be so very easy to build in nearly bulletproof
> security schema that would make the need for third party VPNs passe.
>
> I don't know that ALL of the EMRs out there are STILL this bad, but I'd
> bet that 90% of them are.  They certainly were 3-4 years ago, last time
> I looked in detail.
>
> So this is just par for the course.  Doctors don't understand IT
> security.  EMR creators should, but security is "expensive" and they
> don't bother because it isn't mandated.  The end result is that
> everything from the DB to the physician's working screen is so horribly
> insecure that if any greed-driven cracker out there ever decided to
> exclusively target the weaknesses, they could compromise HIPAA and SSNs
> by the millions.
>
> Sigh.
>
>     rgb
>
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at caltech.edu  Tue Oct  4 17:39:40 2011
From: mathog at caltech.edu (mathog)
Date: Tue, 04 Oct 2011 14:39:40 -0700
Subject: [Beowulf]
 =?utf-8?q?=241=2C_279-per-hour=2C_30=2C=09000-core_clus?=
 =?utf-8?q?ter_built_on_Amazon_EC2_cloud?=
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F010851503486@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <mailman.1.1317754801.13041.beowulf@beowulf.org>
	<d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>
	<ECE7A93BD093E1439C20020FBE87C47F010851503486@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <1a4e05cecd44d8777737e6994d09b289@saf.bio.caltech.edu>

On Tue, 4 Oct 2011 13:43:15 -0700, Lux, Jim (337C) wrote:

> So call it 12c/kWh on peak.  At 400kW, that's $48/hr, which isn't
> bad, operating expenses wise.

Well, yes and no.  If they only turned it on once and a while it 
wouldn't
be too bad, but I'm pretty sure it runs 100% of the time.  At least I 
have never
walked by when the racks were not lit up, so...

   $48 * 24 * 365 = $420480/year

Versus the average lab at (waves hands) $150 in electricity a month = 
$1800/year?
It will of course depend on what kind of work the lab does.  The 
difference is two orders of
magnitude.  Anyway, last I looked we had around 300 professors, so that 
one facility
used up, order of magnitude, as much juice as all the "normal" labs 
combined.  (Certainly there
are some other labs around which also use a lot of electricity.)

Cooling water usage was probably also a sore point from the 
administration's perspective.
Pretty much everything here runs AC off chilled water coming from a 
central plant.  Either
that cluster used up a whole lot of chilled water capacity at the 
central plant or they built a
a separate chiller somewhere. Dave Kewley who sometimes posts here used 
to run
that system, so he would know.

Regards

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jlb17 at duke.edu  Tue Oct  4 17:41:02 2011
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 4 Oct 2011 17:41:02 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041649390.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
Message-ID: <alpine.LRH.2.02.1110041732580.18414@hogwarts.bme.duke.edu>

On Tue, 4 Oct 2011 at 5:03pm, Robert G. Brown wrote

> Needless to say, the passwords and authentication schemes used in EMRs
> are typically a joke -- after all, the users are borderline incompetent
> users and cannot be expected to remember or quickly type in a user id or
> password much more complicated than their own initials.  Many sites have
> one completely trivial password in use by all the physicians and nurses
> who use the system -- just enough to MAYBE keep patients out of the
> system while waiting in an examining room.

My wife's experience here was somewhat the opposite of that.  Within 2 
days of starting her fellowship at UCSF she had acquired over 10 usernames 
and passwords (and one RSA hardware token) for all the various systems she 
needed to interact with.  Each system, of course, had its own password 
aging and renewal rules.  Determining how physicians manage their 
passwords in such an environment is left as an exercise for the reader...

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Oct  5 08:40:53 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 5 Oct 2011 08:40:53 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <44854.192.168.93.213.1317763291.squirrel@mail.eadline.org>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
	<44854.192.168.93.213.1317763291.squirrel@mail.eadline.org>
Message-ID: <alpine.LFD.2.02.1110050822290.14257@lilith>

On Tue, 4 Oct 2011, Douglas Eadline wrote:

>
> Several years ago I flippantly proposed what seems to be
> a simple way to ensure important consumer private data
> (medical, finance, etc.) was safe. Pass a law that says
> organization who collects or holds personal data must
> include the same data for organization's Board of Directors and
> officers (CEO, COO etc) in the database. At least
> the CEO might start taking security serious when
> someone in Bulgaria is buying jet skies with his AMX card.

It wouldn't help.  Physicians are too clueless to understand or care
(mostly, not universally) and besides, what can they do?  They don't
write software.  The companies that provide the software won't have
their board's information in the DB under any circumstances, and they
are the problem.  Or rather, the unregulated nature of the business is
the problem.  The government is spending all sorts of energy specifying
the detailed structure of the DB and ICD codes for every possible
illness at a staggering degree of granularity so that they can
eventually micro-specify compensation rates for fingering your left
gonad during an exam but are leaving HIPAA -- a disaster from day one in
so very many ways -- in place as the sole guardian of our medical
privacy.

HIPAA fails to specify IT security, and obscures precisely who will be
held financially responsible for failures of security or what other
sanctions might be applied.  HIPAA has had the easily predictable side
effect of placing enormous physical and financial obstacles in the path
of medical research, to the point where I think it is safe to say that
HIPAA alone has de fact killed thousands to tens of thousands of people
simply by delaying discovery for years to decades (while costing us a
modest fortune to perform such research as is now performed, with whole
departments in any research setting devoted to managing the
permissioning of the data).  Finally, HIPAA's fundamental original
purpose was to keep e.g. health insurance companies or employers from
getting your health care records and using them to deny coverage or
employment, and it didn't really succeed even in that because of the
appalling state of deregulation in the insurance industry itself.

It's really pretty amazing.  It's hard to imagine how anyone could have
come up with a piece of governance so diabolically well designed to be
enormously expensive in money and lives while failing even to accomplish
its own primary goals or the related goals that it SHOULD have tried to
accomplish (such as mandating a certain -- high -- level of security and
complete open-standard interoperability and data portability in emergent
EMR/PM systems, at least at the DB level), even if they tried.  However,
we should never be hasty to ascribe to human evil that which can
adequately be explained by mere incompetence and stupidity.

But this is OT, and I'll return to my muttons now.  Soap box out.

    rgb

>
> --
> Doug
>
>
>
>
>
>> On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:
>>
>>>
>>> The reason it wasn't encrypted is almost certainly not because it
>>> was difficult to do so for technology reasons. When you see a story
>>> about "data being lost or stolen from a car" it's because it was an ad
>>> hoc situation. Someone got a copy of the data to do some sort of
>>> analysis or to take it somewhere on a onetime basis, and "things went
>>> wrong".
>>>
>>> Any sort of regular process would normally deal with encryption or
>>> security as a matter of course: it's too easy to do it right.
>>
>> The problem being that HIPAA is not amused by incompetence.  The
>> standard is pretty much show due diligence or be prepared to pay massive
>> bucks out in lawsuits should the data you protect be compromised.  It is
>> really a most annoying standard -- I mean it is good that it is so
>> flexible and makes the responsibility clear, but for most of HIPAA's
>> existence it has provided no F***ing guidelines on how to make protected
>> data secure.
>>
>> Consequently (and I say this as a modest consultant-level expert) your
>> data and mine in the Electronic Medical Record of your choice is
>> typically:
>>
>>    a) Stored in flat, unencrypted plaintext or binary image in the base
>> DB.
>>
>>    b) Transmitted in flat, unencrypted plaintext between the server and
>> any LAN-connected clients.  In other words, it assumes that your local
>> LAN is secure.
>>
>>    c) Relies on third party e.g. VPN solutions to provide encryption for
>> use across a WAN.
>>
>> Needless to say, the passwords and authentication schemes used in EMRs
>> are typically a joke -- after all, the users are borderline incompetent
>> users and cannot be expected to remember or quickly type in a user id or
>> password much more complicated than their own initials.  Many sites have
>> one completely trivial password in use by all the physicians and nurses
>> who use the system -- just enough to MAYBE keep patients out of the
>> system while waiting in an examining room.
>>
>> I have had to convince the staff of at least one major EMR company that
>> I will refrain from naming that no, I wasn't going to ship them a copy
>> of an entire dataset exported from an old practice management system --
>> think of it as the names, addresses, SSNs and a few dozen other
>> "protected" pieces of personal information -- to them as an unencrypted
>> zip file over the internet, and had to finally grit my teeth and accept
>> the use of zip's (not terribly good) built in encryption and cross my
>> fingers and pray.
>>
>> Do not underestimate the sheer power of incompetence, in other words,
>> especially incompetence in an environment almost completely lacking
>> meaningful IT-level standards or oversight.  It's really shameful,
>> actually -- it would be so very easy to build in nearly bulletproof
>> security schema that would make the need for third party VPNs passe.
>>
>> I don't know that ALL of the EMRs out there are STILL this bad, but I'd
>> bet that 90% of them are.  They certainly were 3-4 years ago, last time
>> I looked in detail.
>>
>> So this is just par for the course.  Doctors don't understand IT
>> security.  EMR creators should, but security is "expensive" and they
>> don't bother because it isn't mandated.  The end result is that
>> everything from the DB to the physician's working screen is so horribly
>> insecure that if any greed-driven cracker out there ever decided to
>> exclusively target the weaknesses, they could compromise HIPAA and SSNs
>> by the millions.
>>
>> Sigh.
>>
>>     rgb
>>
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>> Duke University Dept. of Physics, Box 90305
>> Durham, N.C. 27708-0305
>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>
>
> --
> Doug
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Oct  5 08:45:02 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 5 Oct 2011 08:45:02 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LRH.2.02.1110041732580.18414@hogwarts.bme.duke.edu>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
	<alpine.LRH.2.02.1110041732580.18414@hogwarts.bme.duke.edu>
Message-ID: <alpine.LFD.2.02.1110050842460.14257@lilith>

On Tue, 4 Oct 2011, Joshua Baker-LePain wrote:

> On Tue, 4 Oct 2011 at 5:03pm, Robert G. Brown wrote
>
>> Needless to say, the passwords and authentication schemes used in EMRs
>> are typically a joke -- after all, the users are borderline incompetent
>> users and cannot be expected to remember or quickly type in a user id or
>> password much more complicated than their own initials.  Many sites have
>> one completely trivial password in use by all the physicians and nurses
>> who use the system -- just enough to MAYBE keep patients out of the
>> system while waiting in an examining room.
>
> My wife's experience here was somewhat the opposite of that.  Within 2
> days of starting her fellowship at UCSF she had acquired over 10 usernames
> and passwords (and one RSA hardware token) for all the various systems she
> needed to interact with.  Each system, of course, had its own password
> aging and renewal rules.  Determining how physicians manage their
> passwords in such an environment is left as an exercise for the reader...

Ah, yes, excellent.  Ten of them AND an RSA e.g. SecureID -- wow, that
takes some real brilliance.  I know how MY physician wife would manage
it...

    rgb

>
> -- 
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Wed Oct  5 09:42:28 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Wed, 05 Oct 2011 09:42:28 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on	Amazon EC2 cloud
In-Reply-To: <20111004205213.GD14057@bx9.net>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>	<alpine.LFD.2.02.1110041414180.14257@lilith>	<4E8B5E98.3090002@sonsorol.org>
	<20111004205213.GD14057@bx9.net>
Message-ID: <4E8C5EC4.9020101@runnersroll.com>

On 10/04/11 16:52, Greg Lindahl wrote:
> On Tue, Oct 04, 2011 at 03:29:28PM -0400, Chris Dagdigian wrote:
>> I'm largely with RGB on this one with the minor caveat that I think he 
>> might be undervaluing the insane economies of scale that IaaS providers 
>> like Amazon & Google can provide.
> 
> cheap land and power hence cheap datacenter rents. And with only 750
> servers, we are already big enough to reap enough outright economy of
> scale to make leasing our own servers in a rented datacenter cheaper
> than renting everything from Amazon.
> 
> The unique thing Amazon is providing is the ability to grow and shrink
> your cluster. Your example of a company which wanted to run a bunch of
> molecular dynamics computations in a short period of time is an
> illustration of that.

On this note, does anyone know if there are prior works (either academic
or publicly disclosed documentations of a company pursuing such a route)
of people splitting their workload up into the "static" and "dynamic"
portions and running them respectively on in-house and rented hardware?
 While I see this discussion time and time again go either one way or
the other (google or amazon, if you will), I suspect for many companies
if it were possible to "invisibly" extend their infrastructure into the
cloud on an as-needed basis, it might be a pretty attractive solution.

Put another way, there doesn't seem to be much sense in buying a couple
more racks for just a short-term project that will result in those racks
going silent afterwards.  On the flipside, you probably have some
fraction of the compute and data resources you need as it is, you just
want it to run a little faster or need a little more scratch
space/bandwidth.  So renting an entire set of resources wouldn't be
optimal either, since that will result in underutilization of the
infrastructure at home.  So just buy whatever fraction your missing from
Amazon from a month and use some hacks to make it look like that
hardware is right there next to your other stuff.  Obviously this
requires an embarrassingly parallel workload due to the locality
dichotomy (or completely disjoint workloads).

Another idea I had was just like solar energy, what if there was a way
for you to build up credits for Amazon in the "day" and use them at
"night"?  I.E. put some Amazon software on your infrastructure that
allows you them to use your servers as part of their "cloud" when you're
not using your equipment at max, and when you do go peak it will
automatically provision more and more Amazon leased resources on an
as-needed basis and burn up those earned credits instead of "real money."

Just some ideas I figured I'd put through the beo-blender to see if they
hold any weight before actually pursuing them as research objectives.

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jcownie at cantab.net  Thu Oct  6 13:33:51 2011
From: jcownie at cantab.net (James Cownie)
Date: Thu, 6 Oct 2011 18:33:51 +0100
Subject: [Beowulf] Beowulf Bash at SC11?
Message-ID: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>

SC approaches fast, but I've seen no mention of a Beowulf Bash.

Has it died?

Did I just miss an announcement?

--
-- Jim
--
James Cownie <jcownie at cantab.net>


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111006/647304c1/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From prentice at ias.edu  Fri Oct  7 09:45:29 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 07 Oct 2011 09:45:29 -0400
Subject: [Beowulf] Beowulf Bash at SC11?
In-Reply-To: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
References: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
Message-ID: <4E8F0279.5070809@ias.edu>

There's an announcement on beowulf.org for a Beowulf Bash... from 2009!

Beowulf Bash: The 11th Annual Beowulf.org Meeting
November 16, 2009
Portland OR
Location: The Game, One Center Court, The Rose Quarter Sponsors:
AMD Cluster Monkey
InsideHPC
Penguin Computing
SiCorp TeraScala
XAND Marketing


On 10/06/2011 01:33 PM, James Cownie wrote:
> SC approaches fast, but I've seen no mention of a Beowulf Bash.
> 
> Has it died?
> 
> Did I just miss an announcement?
> 
> --
> 
> -- Jim
> 
> --
> 
> James Cownie <jcownie at cantab.net <mailto:jcownie at cantab.net>>
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Glen.Beane at jax.org  Fri Oct  7 10:21:41 2011
From: Glen.Beane at jax.org (Glen Beane)
Date: Fri, 7 Oct 2011 14:21:41 +0000
Subject: [Beowulf] Beowulf Bash at SC11?
In-Reply-To: <4E8F0279.5070809@ias.edu>
References: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
	<4E8F0279.5070809@ias.edu>
Message-ID: <7514EA83-EDED-453C-8901-1C861D36C1B2@jax.org>

I remember not hearing much about it last year in New Orleans until someone I knew from Penguin handed me a card Monday night at the opening gala


On Oct 7, 2011, at 9:45 AM, Prentice Bisbal wrote:

> There's an announcement on beowulf.org for a Beowulf Bash... from 2009!
> 
> Beowulf Bash: The 11th Annual Beowulf.org Meeting
> November 16, 2009
> Portland OR
> Location: The Game, One Center Court, The Rose Quarter Sponsors:
> AMD Cluster Monkey
> InsideHPC
> Penguin Computing
> SiCorp TeraScala
> XAND Marketing
> 
> 
> On 10/06/2011 01:33 PM, James Cownie wrote:
>> SC approaches fast, but I've seen no mention of a Beowulf Bash.
>> 
>> Has it died?
>> 
>> Did I just miss an announcement?
>> 
>> --
>> 
>> -- Jim
>> 
>> --
>> 
>> James Cownie <jcownie at cantab.net <mailto:jcownie at cantab.net>>
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Fri Oct  7 17:19:52 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Fri, 7 Oct 2011 17:19:52 -0400 (EDT)
Subject: [Beowulf] Beowulf Bash at SC11?
In-Reply-To: <7514EA83-EDED-453C-8901-1C861D36C1B2@jax.org>
References: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
	<4E8F0279.5070809@ias.edu>
	<7514EA83-EDED-453C-8901-1C861D36C1B2@jax.org>
Message-ID: <47582.192.168.93.213.1318022392.squirrel@mail.eadline.org>

I always announce it on this list and on ClusterMonkey, it also
will be announced on InsideHPC and some of the sponsor sites.

--
Doug


> I remember not hearing much about it last year in New Orleans until
> someone I knew from Penguin handed me a card Monday night at the opening
> gala
>
>
> On Oct 7, 2011, at 9:45 AM, Prentice Bisbal wrote:
>
>> There's an announcement on beowulf.org for a Beowulf Bash... from 2009!
>>
>> Beowulf Bash: The 11th Annual Beowulf.org Meeting
>> November 16, 2009
>> Portland OR
>> Location: The Game, One Center Court, The Rose Quarter Sponsors:
>> AMD Cluster Monkey
>> InsideHPC
>> Penguin Computing
>> SiCorp TeraScala
>> XAND Marketing
>>
>>
>> On 10/06/2011 01:33 PM, James Cownie wrote:
>>> SC approaches fast, but I've seen no mention of a Beowulf Bash.
>>>
>>> Has it died?
>>>
>>> Did I just miss an announcement?
>>>
>>> --
>>>
>>> -- Jim
>>>
>>> --
>>>
>>> James Cownie <jcownie at cantab.net <mailto:jcownie at cantab.net>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Glen L. Beane
> Senior Software Engineer
> The Jackson Laboratory
> (207) 288-6153
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kilian.cavalotti.work at gmail.com  Tue Oct 11 11:21:32 2011
From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI)
Date: Tue, 11 Oct 2011 17:21:32 +0200
Subject: [Beowulf] IBM to acquire Platform Computing
Message-ID: <CAJz=VjE2qi5a5QNAPff2q=bs2qXZMxSS3suuYFUM4Z2mn+u9=g@mail.gmail.com>

http://www.platform.com/press-releases/2011/IBMtoAcquireSystemSoftwareCompanyPlatformComputingtoExtendReachofTechnicalComputing
and
http://www-03.ibm.com/systems/deepcomputing/platform.html

Cheers,
-- 
Kilian
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From dag at sonsorol.org  Wed Oct 12 10:52:13 2011
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 12 Oct 2011 10:52:13 -0400
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
Message-ID: <4E95A99D.9040703@sonsorol.org>


First time I'm seriously pondering bringing 10GbE straight to compute 
nodes ...

For 64 servers (32 to a cabinet) and an HPC system that spans two racks 
what would be the common 10 Gig networking topology be today?

- One large core switch?
- 48 port top-of-rack switches with trunking?
- Something else?

Regards,
Chris


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Wed Oct 12 10:58:58 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 12 Oct 2011 10:58:58 -0400
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <4E95AB32.3030804@scalableinformatics.com>

On 10/12/2011 10:52 AM, Chris Dagdigian wrote:
>
> First time I'm seriously pondering bringing 10GbE straight to compute
> nodes ...
>
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks
> what would be the common 10 Gig networking topology be today?
>
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?

What's the use case?  Low latency, or simplified high bandwidth connection?

10GbE with 40GbE uplinks won't be cheap.  But it would be doable. 
Gnodal, Mellanox, and others would be able to do this.

>
> Regards,
> Chris
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From i.n.kozin at googlemail.com  Wed Oct 12 11:22:52 2011
From: i.n.kozin at googlemail.com (Igor Kozin)
Date: Wed, 12 Oct 2011 16:22:52 +0100
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <CABW111H2fT+SvYA+LJMriVr=OotPAM-u3mPMm+nV6-Y8Y2eBXg@mail.gmail.com>

Gnodal was probably the first to announce a 1U 72 port switch
http://www.gnodal.com/docs/Gnodal%20GS7200%20datasheet.pdf
Other vendors either have announced or will be probably announcing
dense packaging too.

On 12 October 2011 15:52, Chris Dagdigian <dag at sonsorol.org> wrote:
>
> First time I'm seriously pondering bringing 10GbE straight to compute
> nodes ...
>
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks
> what would be the common 10 Gig networking topology be today?
>
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?
>
> Regards,
> Chris
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Wed Oct 12 11:28:28 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Wed, 12 Oct 2011 16:28:28 +0100
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <207BB2F60743C34496BE41039233A80903FB49D5@MRL-PWEXCHMB02.mil.tagmclarengroup.com>


First time I'm seriously pondering bringing 10GbE straight to compute 
nodes ...

For 64 servers (32 to a cabinet) and an HPC system that spans two racks 
what would be the common 10 Gig networking topology be today?

- One large core switch?
- 48 port top-of-rack switches with trunking?
- Something else?


I was going to suggest two Gnodal rack top switches, linked by a 40Gbps link

http://www.gnodal.com/

I see though that their GS7200 switch has 72 x 10Gbps ports - should do you just fine!

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From akshar.bhosale at gmail.com  Wed Oct 12 12:28:57 2011
From: akshar.bhosale at gmail.com (akshar bhosale)
Date: Wed, 12 Oct 2011 21:58:57 +0530
Subject: [Beowulf] refunding reserved amount in gold
Message-ID: <CANaMn15mu5j6wgeovHihP1amULCdk3TSaDRze7Kw3RHn8Q35uA@mail.gmail.com>

Hi,

       We are using PBS (torque 2.4.8) and gold version 2.1.7.1. One of the
jobs went for execution and reserved the equivalent amount. The same job
came out of execution and went in queue from execution. This happened 30
times for the same job. Every time job has reserved amount. Now finally
there is very huge amount(30*charges for that single job) which is shown in
reserved state.Job now does not exist. User can not submit the new job now
because of neglegible amount balance in his account. We want to clear
reserved amount. How to do that?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111012/0ca12c5e/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From Shainer at Mellanox.com  Wed Oct 12 12:30:02 2011
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Wed, 12 Oct 2011 16:30:02 +0000
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <207BB2F60743C34496BE41039233A80903FB49D5@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <4E95A99D.9040703@sonsorol.org>
	<207BB2F60743C34496BE41039233A80903FB49D5@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <F46B2E61C40ADF4ABD39500BC54C3C7918865EDC@MTIDAG01.mtl.com>

You can also check the Mellanox products - both for 40GigE and 10GigE switch fabric. 

Gilad


-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Hearns, John
Sent: Wednesday, October 12, 2011 8:31 AM
To: dag at sonsorol.org; beowulf at beowulf.org
Subject: Re: [Beowulf] 10GbE topologies for small-ish clusters?


First time I'm seriously pondering bringing 10GbE straight to compute nodes ...

For 64 servers (32 to a cabinet) and an HPC system that spans two racks what would be the common 10 Gig networking topology be today?

- One large core switch?
- 48 port top-of-rack switches with trunking?
- Something else?


I was going to suggest two Gnodal rack top switches, linked by a 40Gbps link

http://www.gnodal.com/

I see though that their GS7200 switch has 72 x 10Gbps ports - should do you just fine!

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at ur.rochester.edu  Wed Oct 12 12:33:39 2011
From: scrusan at ur.rochester.edu (Steve Crusan)
Date: Wed, 12 Oct 2011 12:33:39 -0400
Subject: [Beowulf] refunding reserved amount in gold
In-Reply-To: <CANaMn15mu5j6wgeovHihP1amULCdk3TSaDRze7Kw3RHn8Q35uA@mail.gmail.com>
References: <CANaMn15mu5j6wgeovHihP1amULCdk3TSaDRze7Kw3RHn8Q35uA@mail.gmail.com>
Message-ID: <85631CC6-BFE0-44A2-B69E-42BB660AC632@ur.rochester.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I would suggest you post this to the Gold mailing list with a few more pieces of information:

http://www.supercluster.org/mailman/listinfo/gold-users

Regardless, you could probably use the grefund command...


On Oct 12, 2011, at 12:28 PM, akshar bhosale wrote:

> Hi,
> 
>       We are using PBS (torque 2.4.8) and gold version 2.1.7.1. One of the
> jobs went for execution and reserved the equivalent amount. The same job
> came out of execution and went in queue from execution. This happened 30
> times for the same job. Every time job has reserved amount. Now finally
> there is very huge amount(30*charges for that single job) which is shown in
> reserved state.Job now does not exist. User can not submit the new job now
> because of neglegible amount balance in his account. We want to clear
> reserved amount. How to do that?
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOlcFoAAoJENS19LGOpgqK1UIIAIFZj6fIZebQt9xQwmVBVxB9
MPwJMlw4C0F8bR/crGBWx7NUHElep1frROYohD15jN/8bFA2/bJ3xFdiH1bMNqHu
MdB4EmRbs4nuNeN/ZayV4JXBVD3oPuwESYA65jVj0MfbVbzeRod6ZnNvpZOb/Juc
7dHCNPa2coLGLakGEQperOvOOCqsTbxSUdagXulW/1xH3iG+8UPNPJe7ATvO0tE3
FYOot3a3WgN8dsWUnsOKBnA17FA2zN0ac/QdEd2COSbpOjbpQp7BIlg0f0QIIkU6
pVq1C706jn5Cl4gKXsfC277Rrx3eLl3YPVA6XaL95PSXBH51L7Y3ViqMmVe9Coo=
=cSUy
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Wed Oct 12 14:04:27 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 12 Oct 2011 11:04:27 -0700
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org> <20111012180002.GC5039@bx9.net>
Message-ID: <20111012180427.GD5039@bx9.net>

We just bought a couple of 64-port 10g switches from Blade, for the
middle of our networking infrastructure. They were the winner over all
the others, lowest price and appropriate features. We also bought
Blade top-of-rack switches. Now that they've been bought up by IBM you
have to negotiate harder to get that low price, but you can still get
it by threatening them with competing quotes.

Gnodal looks very interesting for larger, multi-switch clusters, they
were just a bit late to market for us. Arista really believes that
their high prices are justified; we didn't.

And if anyone would like to buy some used Mellanox 48-port 10ge
switches, we have 2 extras we'd like to sell.

-- greg

On Wed, Oct 12, 2011 at 10:52:13AM -0400, Chris Dagdigian wrote:
> 
> First time I'm seriously pondering bringing 10GbE straight to compute 
> nodes ...
> 
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks 
> what would be the common 10 Gig networking topology be today?
> 
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?
> 
> Regards,
> Chris
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Shainer at Mellanox.com  Wed Oct 12 14:11:04 2011
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Wed, 12 Oct 2011 18:11:04 +0000
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <20111012180427.GD5039@bx9.net>
References: <4E95A99D.9040703@sonsorol.org> <20111012180002.GC5039@bx9.net>
	<20111012180427.GD5039@bx9.net>
Message-ID: <F46B2E61C40ADF4ABD39500BC54C3C7918866232@MTIDAG01.mtl.com>

The 48-ports are not Mellanox but previous company that Mellanox acquired, as the Mellanox ones are 36 x 40G or 64 x 10G in 1U (or bigger). But please don't let these small details hold you from re-living your history.

Good luck selling.

-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Greg Lindahl
Sent: Wednesday, October 12, 2011 11:05 AM
To: Chris Dagdigian
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] 10GbE topologies for small-ish clusters?

We just bought a couple of 64-port 10g switches from Blade, for the middle of our networking infrastructure. They were the winner over all the others, lowest price and appropriate features. We also bought Blade top-of-rack switches. Now that they've been bought up by IBM you have to negotiate harder to get that low price, but you can still get it by threatening them with competing quotes.

Gnodal looks very interesting for larger, multi-switch clusters, they were just a bit late to market for us. Arista really believes that their high prices are justified; we didn't.

And if anyone would like to buy some used Mellanox 48-port 10ge switches, we have 2 extras we'd like to sell.

-- greg

On Wed, Oct 12, 2011 at 10:52:13AM -0400, Chris Dagdigian wrote:
> 
> First time I'm seriously pondering bringing 10GbE straight to compute 
> nodes ...
> 
> For 64 servers (32 to a cabinet) and an HPC system that spans two 
> racks what would be the common 10 Gig networking topology be today?
> 
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?
> 
> Regards,
> Chris
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cap at nsc.liu.se  Thu Oct 13 07:51:56 2011
From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=)
Date: Thu, 13 Oct 2011 13:51:56 +0200
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <201110131351.59977.cap@nsc.liu.se>

On Wednesday, October 12, 2011 04:52:13 PM Chris Dagdigian wrote:
> First time I'm seriously pondering bringing 10GbE straight to compute
> nodes ...
> 
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks
> what would be the common 10 Gig networking topology be today?

Both Arista and Blade (now IBM) has 64 port 1U single ASIC switches (a few 
ports will require qsfp to sfp+ break out cables afaict).

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111013/491cbdcf/attachment.sig>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From prentice at ias.edu  Fri Oct 21 09:10:18 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 09:10:18 -0400
Subject: [Beowulf] Users abusing screen
Message-ID: <4EA16F3A.8080209@ias.edu>

Beowulfers,

I have a question that isn't directly related to clusters, but I suspect
it's an issue many of you are dealing with are dealt with: users using
the screen command to stay logged in on systems and running long jobs
that they forget about. Have any of you experienced this, and how did
you deal with it?

Here's my scenario:

In addition to my cluster, we have a bunch of "computer servers" where
users can run the programs. These are "large" boxes with more cores
(24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
desktop top.

Periodically, when I have to shutdown/reboot a system for maintenance,
I find a LOT of shells being run through the screen command for users
who aren't logged in. The majority are idle shells, but many are running
jobs, that seem to be forgotten about. For example, I recently found
some jobs running since July or August that were running under the
account of someone who hasn't even been here for months!

My opinion is these these are shared resources, and if you aren't
interactively using them, you should log out to free up resources for
others. If you have a job that can be run non-interactively, you should
submit it to the cluster.

Has anyone else here dealt with the problem?

I would like to remove screen from my environment entirely to prevent
this. My fellow sysadmins here agree. I'm expecting massive backlash
from the users.


-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 12:07:27 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 12:07:27 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
	<4EA1854B.5090506@ias.edu>
	<CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
Message-ID: <4EA198BF.3030002@ias.edu>

On 10/21/2011 11:06 AM, Kilian Cavalotti wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>>> Have you thought about queueing systems like condor or SGE?
>>
>> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
>> (non-MPI, etc.) there, so there is no need for them to use screen to
>> execute long-running jobs. Hence my frustration.
> 
> You could alias "screen" to "qlogin". :)

Actually, I can't for reasons I can't get into here. But something like
that was part of my original "master plan".

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 12:10:36 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 12:10:36 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <7B82E572-588E-41A4-9B46-8A1A07360A30@staff.uni-marburg.de>
References: <4EA16F3A.8080209@ias.edu>
	<7B82E572-588E-41A4-9B46-8A1A07360A30@staff.uni-marburg.de>
Message-ID: <4EA1997C.70103@ias.edu>

On 10/21/2011 11:24 AM, Reuti wrote:
> Hi,
> 
> Am 21.10.2011 um 15:10 schrieb Prentice Bisbal:
> 
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
>>
>> Here's my scenario:
>>
>> In addition to my cluster, we have a bunch of "computer servers" where
>> users can run the programs. These are "large" boxes with more cores
>> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
>> desktop top.
>>
>> Periodically, when I have to shutdown/reboot a system for maintenance,
>> I find a LOT of shells being run through the screen command for users
>> who aren't logged in. The majority are idle shells, but many are running
>> jobs, that seem to be forgotten about. For example, I recently found
>> some jobs running since July or August that were running under the
>> account of someone who hasn't even been here for months!
>>
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others. If you have a job that can be run non-interactively, you should
>> submit it to the cluster.
>>
>> Has anyone else here dealt with the problem?
>>
>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> I disallow rsh to the machines and limit ssh to admin staff. Users who want to run something on a machine have to go through the queuing system to get access to a node granted by GridEngine (for the startup method you can use either the -builtin- or [in case you need X11 forwarding] by a different sshd_config and ssh [GridEngine will start one daemon per task], one additional step is necessary for a tight integration of ssh).
> 
> For users just checking their jobs on a node I have a dedicated queue (where they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse it).
> 
> -- Reuti
> 

Reuti,

That was EXACTLY my original plan, but for reasons I don't want to get
into, I can't implement that. In fact, just yesterday I ripped out all
the SGE queues I had configured to that. Why? because I was tired of
seeing them and being reminded of what a good idea it was. :(

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 12:12:53 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 12:12:53 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA19365.4030109@runnersroll.com>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
Message-ID: <4EA19A05.4000400@ias.edu>

On 10/21/2011 11:44 AM, Ellis H. Wilson III wrote:
> On 10/21/11 09:10, Prentice Bisbal wrote:
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
> 
> I think this is strongly tied to what kind of work the users are doing
> (i.e. how interactive it is, how long jobs take, how likely failure is
> to occur that they must react to).  In my personal experience the jobs I
> spawn aren't interactive, tend to take a long time, and because of point
> 2 require me to react pretty quickly to their failure or I lose out on
> valuable compute-time.  However, they are cumbersome to execute via a
> queuing manager (my work is in systems, so perhaps that area is an
> exception).  Therefore what I always do is just nohup myself a job, and
> tail -f it if I need to watch it.  I've adapted my ssh config such that
> I don't get booted off after 5 or 10 minutes without any input from me
> (I think the limit I set is like 2hours or something), so I can watch
> output fly by to my hearts content.
> 
> If I were you, I think the best way to avoid a user-uprising, but to
> achieve your goal is to give instructions on how a user can nohup (yes,
> just assume they don't know how) and how to configure ssh to not die
> after a short time.  This way they don't have to worry about getting
> disconnected if they aren't constantly interacting (so they can watch
> output), but they also aren't staying logged on indefinitely (since
> presumably their laptops/desktops aren't on indefinitely).
> 
> If you give them an alternative that is well defined with an example
> (not just, "Oh you can use such-and-such instead.") I can hardly believe
> they'll be all that upset.
> 

Ellis,

Using nohup was exactly the advice I gave to one of my users yesterday.
Not sure if he'll use it. 'man' is a very difficult program to learn,
from what I understand.

Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Fri Oct 21 11:24:32 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Fri, 21 Oct 2011 17:24:32 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <7B82E572-588E-41A4-9B46-8A1A07360A30@staff.uni-marburg.de>

Hi,

Am 21.10.2011 um 15:10 schrieb Prentice Bisbal:

> Beowulfers,
> 
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?
> 
> Here's my scenario:
> 
> In addition to my cluster, we have a bunch of "computer servers" where
> users can run the programs. These are "large" boxes with more cores
> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
> desktop top.
> 
> Periodically, when I have to shutdown/reboot a system for maintenance,
> I find a LOT of shells being run through the screen command for users
> who aren't logged in. The majority are idle shells, but many are running
> jobs, that seem to be forgotten about. For example, I recently found
> some jobs running since July or August that were running under the
> account of someone who hasn't even been here for months!
> 
> My opinion is these these are shared resources, and if you aren't
> interactively using them, you should log out to free up resources for
> others. If you have a job that can be run non-interactively, you should
> submit it to the cluster.
> 
> Has anyone else here dealt with the problem?
> 
> I would like to remove screen from my environment entirely to prevent
> this. My fellow sysadmins here agree. I'm expecting massive backlash
> from the users.

I disallow rsh to the machines and limit ssh to admin staff. Users who want to run something on a machine have to go through the queuing system to get access to a node granted by GridEngine (for the startup method you can use either the -builtin- or [in case you need X11 forwarding] by a different sshd_config and ssh [GridEngine will start one daemon per task], one additional step is necessary for a tight integration of ssh).

For users just checking their jobs on a node I have a dedicated queue (where they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse it).

-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bug at sas.upenn.edu  Fri Oct 21 11:17:55 2011
From: bug at sas.upenn.edu (Gavin W. Burris)
Date: Fri, 21 Oct 2011 11:17:55 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<20111021134457.GA22748@grml>	<4EA1854B.5090506@ias.edu>
	<CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
Message-ID: <4EA18D23.4050501@sas.upenn.edu>


On 10/21/2011 11:06 AM, Kilian Cavalotti wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>>> Have you thought about queueing systems like condor or SGE?
>>
>> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
>> (non-MPI, etc.) there, so there is no need for them to use screen to
>> execute long-running jobs. Hence my frustration.
> 
> You could alias "screen" to "qlogin". :)
> 
> Cheers,


I think we have a winner. :)

-- 
Gavin W. Burris
Senior Systems Programmer
Information Security and Unix Systems
School of Arts and Sciences
University of Pennsylvania
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Fri Oct 21 11:44:37 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Fri, 21 Oct 2011 11:44:37 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <4EA19365.4030109@runnersroll.com>

On 10/21/11 09:10, Prentice Bisbal wrote:
> Beowulfers,
> 
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?

I think this is strongly tied to what kind of work the users are doing
(i.e. how interactive it is, how long jobs take, how likely failure is
to occur that they must react to).  In my personal experience the jobs I
spawn aren't interactive, tend to take a long time, and because of point
2 require me to react pretty quickly to their failure or I lose out on
valuable compute-time.  However, they are cumbersome to execute via a
queuing manager (my work is in systems, so perhaps that area is an
exception).  Therefore what I always do is just nohup myself a job, and
tail -f it if I need to watch it.  I've adapted my ssh config such that
I don't get booted off after 5 or 10 minutes without any input from me
(I think the limit I set is like 2hours or something), so I can watch
output fly by to my hearts content.

If I were you, I think the best way to avoid a user-uprising, but to
achieve your goal is to give instructions on how a user can nohup (yes,
just assume they don't know how) and how to configure ssh to not die
after a short time.  This way they don't have to worry about getting
disconnected if they aren't constantly interacting (so they can watch
output), but they also aren't staying logged on indefinitely (since
presumably their laptops/desktops aren't on indefinitely).

If you give them an alternative that is well defined with an example
(not just, "Oh you can use such-and-such instead.") I can hardly believe
they'll be all that upset.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Fri Oct 21 12:26:09 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Fri, 21 Oct 2011 12:26:09 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA19A05.4000400@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
	<4EA19A05.4000400@ias.edu>
Message-ID: <4EA19D21.3090902@runnersroll.com>

On 10/21/11 12:12, Prentice Bisbal wrote:
>> If you give them an alternative that is well defined with an example
>> (not just, "Oh you can use such-and-such instead.") I can hardly believe
>> they'll be all that upset.
>>
> 
> Ellis,
> 
> Using nohup was exactly the advice I gave to one of my users yesterday.
> Not sure if he'll use it. 'man' is a very difficult program to learn,
> from what I understand.

Hahaha, I love your cynicism.  Right up my alley, however, I think in
all seriousness 'man' does fall short for many applications in terms of
examples (there are exceptions to this, but most man docs don't have
examples from my experience).  Many users just want examples of it's
use, and can derive their case faster from such than custom-creation of
a set of parameters from man.

So just take a few moments, cook up an example of 'nohup ./someapp &>
out.txt &' usage and associated ways to kill and watch it's output and
put it all into an email.  Save that email away, and when you're ready
just shoot it out to everyone.  Or if you have an internal wiki setup,
that's much, much better.  Just forward a link to some new page on it.

If you make even a half-assed effort to show you are providing a viable
alternative and a low bar to entry, you'll cut the number of people
complaining at least in half.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Fri Oct 21 11:26:57 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Fri, 21 Oct 2011 17:26:57 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
	<4EA1854B.5090506@ias.edu>
	<CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
Message-ID: <46778F4F-95ED-4FC7-B936-F8221A759916@staff.uni-marburg.de>

Am 21.10.2011 um 17:06 schrieb Kilian Cavalotti:

> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>>> Have you thought about queueing systems like condor or SGE?
>> 
>> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
>> (non-MPI, etc.) there, so there is no need for them to use screen to
>> execute long-running jobs. Hence my frustration.
> 
> You could alias "screen" to "qlogin". :)

Isn't it to late at that point if I get it right? They login by ssh to an exechost and issue thereon screen to reconnect later. But they should already use qlogin to go to the exechost.

-- Reuti


> Cheers,
> -- 
> Kilian
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Fri Oct 21 12:45:38 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Fri, 21 Oct 2011 09:45:38 -0700
Subject: [Beowulf] about 'man' Re:  Users abusing screen
In-Reply-To: <4EA19A05.4000400@ias.edu>
Message-ID: <CAC6EF55.10A15%james.p.lux@jpl.nasa.gov>


On 10/21/11 9:12 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
>
>Ellis,
>
>Using nohup was exactly the advice I gave to one of my users yesterday.
>Not sure if he'll use it. 'man' is a very difficult program to learn,
>from what I understand.

Well... 'man' is easy, but sometimes, you need decent examples and
tutorials.  Just knowing what all the switches are and the format is like
giving someone a dictionary and saying: now write me a sonnet.  This is
especially so for the "swiss army knife" type utilities (grep, I'm looking
at you!)


>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 10:44:27 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 10:44:27 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111021134457.GA22748@grml>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
Message-ID: <4EA1854B.5090506@ias.edu>


On 10/21/2011 09:44 AM, Henning Fehrmann wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
>>
>> Here's my scenario:
>>
>> In addition to my cluster, we have a bunch of "computer servers" where
>> users can run the programs. These are "large" boxes with more cores
>> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
>> desktop top.
>>
>> Periodically, when I have to shutdown/reboot a system for maintenance,
>> I find a LOT of shells being run through the screen command for users
>> who aren't logged in. The majority are idle shells, but many are running
>> jobs, that seem to be forgotten about. For example, I recently found
>> some jobs running since July or August that were running under the
>> account of someone who hasn't even been here for months!
>>
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others. If you have a job that can be run non-interactively, you should
>> submit it to the cluster.
>>
>> Has anyone else here dealt with the problem?
>>
>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> I wouldn't deinstall screen. It is a useful tool for many things and
> there are alternatives doing the same.  Instead one could enforce a
> maximum CPU time a job can take by setting ulimits.
> 
> Have you thought about queueing systems like condor or SGE? 

Yes, I have cluster that uses SGE, and we allow users to run serial jobs
(non-MPI, etc.) there, so there is no need for them to use screen to
execute long-running jobs. Hence my frustration.

Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From kilian.cavalotti.work at gmail.com  Fri Oct 21 11:06:11 2011
From: kilian.cavalotti.work at gmail.com (Kilian Cavalotti)
Date: Fri, 21 Oct 2011 17:06:11 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA1854B.5090506@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
	<4EA1854B.5090506@ias.edu>
Message-ID: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>

Hi Prentice,

On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>> Have you thought about queueing systems like condor or SGE?
>
> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
> (non-MPI, etc.) there, so there is no need for them to use screen to
> execute long-running jobs. Hence my frustration.

You could alias "screen" to "qlogin". :)

Cheers,
-- 
Kilian
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From atp at piskorski.com  Fri Oct 21 15:14:01 2011
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 21 Oct 2011 15:14:01 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <20111021191401.GA87390@piskorski.com>

On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:

> My opinion is these these are shared resources, and if you aren't
> interactively using them, you should log out to free up resources for
> others.

"running under screen" != "non-interactive".

> I would like to remove screen from my environment entirely to prevent
> this. My fellow sysadmins here agree. I'm expecting massive backlash
> from the users.

No shit.  If you allow users to login at all, then (IMNSHO) removing
screen is insane.  That's not a solution to your problem, that's
creating a totally new problem and pretending it's a solution.

I essentially always use screen whenever I ssh to any Linux box for
any reason.  If my sysadmin arbitrarily disabled screen because some
other user was doing something dumb, I'd be pretty upset too.
(Annoyed enough to maybe just build screen myself on that box.)

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Fri Oct 21 22:18:19 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 21 Oct 2011 22:18:19 -0400
Subject: [Beowulf] about 'man' Re: Users abusing screen
In-Reply-To: <CAC6EF55.10A15%james.p.lux@jpl.nasa.gov>
References: <4EA19A05.4000400@ias.edu>
	<CAC6EF55.10A15%james.p.lux@jpl.nasa.gov>
Message-ID: <CAF4H3kdHHL+e0rQbmu32VN9Q8zhzB9D7=Ssa78dz5Z0ARJcRcQ@mail.gmail.com>

I'm not a sysadmin, but I thought these days we were supposed to point
[end]users at "help" or "doc" instead of man? Man is like sdb, it's great
but not for everyone, you need context to appreciate it. I think in System V
type derivatives it's usually "help"?
peter

On Fri, Oct 21, 2011 at 12:45 PM, Lux, Jim (337C)
<james.p.lux at jpl.nasa.gov>wrote:

>
>
> On 10/21/11 9:12 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
> >
> >Ellis,
> >
> >Using nohup was exactly the advice I gave to one of my users yesterday.
> >Not sure if he'll use it. 'man' is a very difficult program to learn,
> >from what I understand.
>
> Well... 'man' is easy, but sometimes, you need decent examples and
> tutorials.  Just knowing what all the switches are and the format is like
> giving someone a dictionary and saying: now write me a sonnet.  This is
> especially so for the "swiss army knife" type utilities (grep, I'm looking
> at you!)
>
>
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111021/7494bdc5/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From ellis at runnersroll.com  Sat Oct 22 08:02:35 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Sat, 22 Oct 2011 08:02:35 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111021191401.GA87390@piskorski.com>
References: <4EA16F3A.8080209@ias.edu> <20111021191401.GA87390@piskorski.com>
Message-ID: <4EA2B0DB.3040702@runnersroll.com>

On 10/21/11 15:14, Andrew Piskorski wrote:
> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
> 
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others.
> 
> "running under screen" != "non-interactive".

What I think Prentice was pointing out here was more along the lines of:
"non-interactive" >= "running under screen" <= interactive
Where interactivity is more of a spectrum than a != or =.  More
pointedly, he stated his users are acting in a non-interactive manner,
in some cases even after they leave, which is irresponsible at all
levels.  Obviously he has to balance a rule-set between the good users
and the bad users, such that abuse isn't quite as easy.

>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> No shit.  If you allow users to login at all, then (IMNSHO) removing
> screen is insane.  That's not a solution to your problem, that's
> creating a totally new problem and pretending it's a solution.

Insane?  I mean, I do a lot of work on a bunch of different distros and
hardware types, and have found little use for screen /unless/ I was on a
really, really poor internet connection that cut out on the minutes
level.  Can you give some examples regarding something you can do with
screen you cannot do with nohup and tail?

> I essentially always use screen whenever I ssh to any Linux box for
> any reason.

But why?  Just leave a terminal open if you want interactivity,
otherwise nohup something.  Perhaps I've understated screen's
usefulness, but I'm glad to be corrected/educated on it's efficacy in
this area.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From skylar at cs.earlham.edu  Sat Oct 22 13:24:02 2011
From: skylar at cs.earlham.edu (Skylar Thompson)
Date: Sat, 22 Oct 2011 10:24:02 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA2B0DB.3040702@runnersroll.com>
References: <4EA16F3A.8080209@ias.edu> <20111021191401.GA87390@piskorski.com>
	<4EA2B0DB.3040702@runnersroll.com>
Message-ID: <4EA2FC32.9000605@cs.earlham.edu>

On 10/22/11 05:02, Ellis H. Wilson III wrote:
>
> Insane?  I mean, I do a lot of work on a bunch of different distros and
> hardware types, and have found little use for screen /unless/ I was on a
> really, really poor internet connection that cut out on the minutes
> level.  Can you give some examples regarding something you can do with
> screen you cannot do with nohup and tail?
>
>   

Here's a few I can think of:

* Multiple shells off one login
* Scroll buffer
* Copy&paste w/o needing a mouse
* Start session logging at any time, w/o needing to remember to use
script or nohup

I guess I'm with Andrew, where the first thing I do upon logging in is
either connecting to an existing screen session or starting a fresh one.

-- 
-- Skylar Thompson (skylar at cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111022/ddd35d99/attachment.sig>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From j.wender at science-computing.de  Mon Oct 24 02:30:12 2011
From: j.wender at science-computing.de (Jan Wender)
Date: Mon, 24 Oct 2011 08:30:12 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <4EA505F4.7080007@science-computing.de>

On 10/21/2011 03:10 PM, Prentice Bisbal wrote:
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?

How about killing long-running (either elapsed or used time) processes not
started through the batch system? You should be able to identify them by looking
at the process tree.
At least one cluster I know kills all user processes which have not been started
from the queueing system.

Cheerio,
Jan
-- 
---- Company Information ----
Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr.
Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Philippe Miltin
Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: j_wender.vcf
Type: text/x-vcard
Size: 338 bytes
Desc: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111024/96975f63/attachment.vcf>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From greg.matthews at diamond.ac.uk  Mon Oct 24 07:00:19 2011
From: greg.matthews at diamond.ac.uk (Gregory Matthews)
Date: Mon, 24 Oct 2011 12:00:19 +0100
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA19A05.4000400@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
	<4EA19A05.4000400@ias.edu>
Message-ID: <4EA54543.5090908@diamond.ac.uk>

Prentice Bisbal wrote:
> Using nohup was exactly the advice I gave to one of my users yesterday.
> Not sure if he'll use it. 'man' is a very difficult program to learn,
> from what I understand.

our experience of ppl using nohup without really thinking it through is 
eventually filling the partition with an enormous nohup.out file.

GREG

> 
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


-- 
Greg Matthews            01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Oct 24 07:20:02 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 24 Oct 2011 13:20:02 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA54543.5090908@diamond.ac.uk>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
	<4EA19A05.4000400@ias.edu> <4EA54543.5090908@diamond.ac.uk>
Message-ID: <9DA6F2A5-6736-457F-AE89-C5EC56735C09@staff.uni-marburg.de>

Am 24.10.2011 um 13:00 schrieb Gregory Matthews:

> Prentice Bisbal wrote:
>> Using nohup was exactly the advice I gave to one of my users yesterday.
>> Not sure if he'll use it. 'man' is a very difficult program to learn,
>> from what I understand.
> 
> our experience of ppl using nohup without really thinking it through is 
> eventually filling the partition with an enormous nohup.out file.

It's possible to make an alias, so that "nohup" reads "nohup > /dev/null"

The redirection doesn't need to be at the end of the command. Depends whether they need the output, and/or any output file is created by the application on its own anyway.

-- Reuti


> GREG
> 
>> 
>> Prentice
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>> 
> 
> 
> -- 
> Greg Matthews            01235 778658
> Senior Computer Systems Administrator
> Diamond Light Source, Oxfordshire, UK
> 
> -- 
> This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Mon Oct 24 09:42:23 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 24 Oct 2011 09:42:23 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA2B0DB.3040702@runnersroll.com>
References: <4EA16F3A.8080209@ias.edu> <20111021191401.GA87390@piskorski.com>
	<4EA2B0DB.3040702@runnersroll.com>
Message-ID: <4EA56B3F.3060404@ias.edu>

On 10/22/2011 08:02 AM, Ellis H. Wilson III wrote:
> On 10/21/11 15:14, Andrew Piskorski wrote:
>> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
>>
>>> My opinion is these these are shared resources, and if you aren't
>>> interactively using them, you should log out to free up resources for
>>> others.
>> "running under screen" != "non-interactive".
> What I think Prentice was pointing out here was more along the lines of:
> "non-interactive" >= "running under screen" <= interactive
> Where interactivity is more of a spectrum than a != or =.  More
> pointedly, he stated his users are acting in a non-interactive manner,
> in some cases even after they leave, which is irresponsible at all
> levels.  Obviously he has to balance a rule-set between the good users
> and the bad users, such that abuse isn't quite as easy.

Thanks for coming to my defense, Ellis. I don't think I could have
explained it better myself.

>>> I would like to remove screen from my environment entirely to prevent
>>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>>> from the users.
>> No shit.  If you allow users to login at all, then (IMNSHO) removing
>> screen is insane.  That's not a solution to your problem, that's
>> creating a totally new problem and pretending it's a solution.
> Insane?  I mean, I do a lot of work on a bunch of different distros and
> hardware types, and have found little use for screen /unless/ I was on a
> really, really poor internet connection that cut out on the minutes
> level.  Can you give some examples regarding something you can do with
> screen you cannot do with nohup and tail?

I agree.  I've been a professional sys admin using Unix/Linux day in and
day out for well over 10 years, and not one days has gone by where I saw
a need for screen.
 
>> I essentially always use screen whenever I ssh to any Linux box for
>> any reason.
> But why?  Just leave a terminal open if you want interactivity,
> otherwise nohup something.  Perhaps I've understated screen's
> usefulness, but I'm glad to be corrected/educated on it's efficacy in
> this area.
>
> Best,
>
> ellis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Mon Oct 24 09:46:49 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 24 Oct 2011 09:46:49 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA505F4.7080007@science-computing.de>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
Message-ID: <4EA56C49.9060204@ias.edu>


On 10/24/2011 02:30 AM, Jan Wender wrote:
> On 10/21/2011 03:10 PM, Prentice Bisbal wrote:
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
> How about killing long-running (either elapsed or used time) processes not
> started through the batch system? You should be able to identify them by looking
> at the process tree.
> At least one cluster I know kills all user processes which have not been started
> from the queueing system.

The systems where screen is being abused are not part of the batch
system, and they will not /can not be for reasons I don't want to get
into here. The problem with killing long-running programs is that there
are often long running programs that are legitimate in my evironment. I
can quickly scan 'ps' output and determine which is which, but I doubt
that kind of intelligence could ever be built into a shell script.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Mon Oct 24 10:22:50 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 24 Oct 2011 10:22:50 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAE=SCg4f5s+4abZu63N=X+pqKx9xd4TjuAA8UAqrhzC0C6yAGw@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<CAE=SCg4f5s+4abZu63N=X+pqKx9xd4TjuAA8UAqrhzC0C6yAGw@mail.gmail.com>
Message-ID: <4EA574BA.2050304@ias.edu>

Anything is possible if you're a good enough programmer. Like I said
earlier, there are some users legitimately running long jobs on the
systems in question. Instead of developing a clever program to
automatically kill long running screen jobs, I think it would be better
to be up front with my users and remove screen, rather than let them use
it, only to surprise them later by killing their jobs.


On 10/24/2011 09:55 AM, geert geurts wrote:
>
> Hello Prentice,
>
> Screen is a essential app, for sure.
> But as an answer to the initial question...
> I'm not much of a programmer, but can't you replace the binary with a
> custom compiled version which runs two threads? One with the initial
> program, and one which sleeps for the maximum amount of time you're
> willing to allow screen sessions to last, and kills the session when
> the time runs out...
>
> Or maybe build some script around the actual binary to do the same..
>
>
> Regards,
> Geert
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Mon Oct 24 18:48:44 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Tue, 25 Oct 2011 09:48:44 +1100
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <4EA5EB4C.3000809@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 22/10/11 00:10, Prentice Bisbal wrote:

> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?

Hmm, any way of making a local version of screen which
puts all the processes into a cpuset or control group
so you can easily distinguish between ones in screen
and outside of it ? Perhaps even doing it with a wrapper
if you didn't want to build a modified version ?

That way you get to restrict the number of cores they
can monopolise..

Of course a user could get around it by building their
own copy, but at least then you'd be able to see that..

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6l60wACgkQO2KABBYQAh/YtwCfegBzvEpH/s4PtHnFlEwSqQLK
UO8An3DK20lEVrT9WM8qln0wM7alKoU6
=oInQ
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Tue Oct 25 19:13:05 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Tue, 25 Oct 2011 16:13:05 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA56C49.9060204@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu>
Message-ID: <20111025231305.GC9493@bx9.net>

On Mon, Oct 24, 2011 at 09:46:49AM -0400, Prentice Bisbal wrote:

> The systems where screen is being abused are not part of the batch
> system, and they will not /can not be for reasons I don't want to get
> into here. The problem with killing long-running programs is that there
> are often long running programs that are legitimate in my evironment. I
> can quickly scan 'ps' output and determine which is which, but I doubt
> that kind of intelligence could ever be built into a shell script.

I see that you didn't bother to check out the software proposed soon
after you asked your question. If you don't check out potential
answers because you doubt they will work, why should anyone bother to
reply to you?

The problem you have is a common issue in university environments, and
the common solution is a script that accurately figures out
long-running cpu-intensive programs and nices/kills them. I first ran
into such a thing in, oh, 1992? It's not rocket science.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Wed Oct 26 10:31:56 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Wed, 26 Oct 2011 10:31:56 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111025231305.GC9493@bx9.net>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
Message-ID: <4EA819DC.9090106@ias.edu>

On 10/25/2011 07:13 PM, Greg Lindahl wrote:
> On Mon, Oct 24, 2011 at 09:46:49AM -0400, Prentice Bisbal wrote:
>
>> The systems where screen is being abused are not part of the batch
>> system, and they will not /can not be for reasons I don't want to get
>> into here. The problem with killing long-running programs is that there
>> are often long running programs that are legitimate in my evironment. I
>> can quickly scan 'ps' output and determine which is which, but I doubt
>> that kind of intelligence could ever be built into a shell script.
> I see that you didn't bother to check out the software proposed soon
> after you asked your question. If you don't check out potential
> answers because you doubt they will work, why should anyone bother to
> reply to you?

Greg,

I didn't realize I needed to log a detailed response to every suggestion
made to me on this list. I've been a member of this list for quite
sometime, and I've never seen a comment like yours before. You're out of
line.

People should bother to reply to me because I've been a participating
member of this list for 4 years now, and often assist others when I can.
I don't expect a response to every suggestion I provide to others.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bcostescu at gmail.com  Wed Oct 26 11:41:50 2011
From: bcostescu at gmail.com (Bogdan Costescu)
Date: Wed, 26 Oct 2011 17:41:50 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>

On Fri, Oct 21, 2011 at 15:10, Prentice Bisbal <prentice at ias.edu> wrote:
> Periodically, when I have to shutdown/reboot a system for maintenance,
> I find a LOT of shells being run through the screen command for users
> who aren't logged in. The majority are idle shells, but many are running
> jobs, that seem to be forgotten about.
> ...
> I would like to remove screen from my environment entirely to prevent
> this.

>From what I understand from your message, it's not screen per-se which
upsets you, it's the way it is (ab)used by some users to start long
running memory hogging jobs; you seem to be OK with idle shells found
at maintenance time which are still started through screen. So why the
backlash against screen ?

Starting jobs in the background can be done directly through the
shell, with no screen; if the job can be split in smaller pieces
time-wise, they can be started by at/cron; screen can be installed by
a user, possible under a different name... so many and surely other
possibilities to still upset you even if you uninstall screen, because
you focus on the wrong subject. To deal with forgotten long running
jobs, you have various administrative (f.e. bill users/groups, even if
in some kind of symbolic way) or technical (f.e. only allow 24h CPU
time through system-wide limits or install a daemon which watches and
warns and/or takes measures) means - some of these have been discussed
on this very list in the past or have been mentioned earlier in this
thread. Each situation is different (f.e. some legitimate jobs could
run for more than 24h), so you should check all suggestions and apply
the one(s) which fit(s) best. I know from my own experience that it's
not easy to be on this side of the fence :-)

Good luck!
Bogdan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Oct 26 12:22:31 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 26 Oct 2011 12:22:31 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1110261210450.25447@lilithnew>

OK, OK, I haven't participated in this discussion so far -- way too
busy.  But since it keeps on going, and going, and going, and since
nobody has mentioned the obvious and permanent solution, I'm going to
have to bring it up:

>From "man 8 syslogd", which alas seems to no longer exist save in our
hearts and memories, when confronted with any sort of persistent system
abuse:

5. Use step 4 and if the problem persists and is not secondary to a
    rogue program/daemon get a 3.5 ft (approx. 1 meter) length of
    sucker rod* and have a chat with the user in question.

*  Sucker rod def.  ?  3/4, 7/8 or 1in. hardened steel rod, male
    threaded on each end.  Primary use in the oil industry in West-
    ern North Dakota and other locations to pump 'suck' oil from oil
    wells.  Secondary uses are for the construction of cattle feed
    lots and for dealing with the occasional recalcitrant or bel-
    ligerent individual.

I've found that the "sucker rod solution" is really the only one that
ultimately works.  Even if it is merely present when discussing the
problem with the worst offenders, it marvelously focusses the mind on
the severity of the issue.

Otherwise (as has been pointed out repeatedly) it is rather trivial to
write an e.g. cron script that reaps/kills ANYTHING undesireable on a
public server.  Invariably they will sooner or later kill something that
shouldn't be killed in the sense that it is doing some sort of useful
work, but screen isn't likely to be something in that category.

Myself, I like the sucker rod approach.  BANG down on the desk with it
and say something ominous like "So, you've been cluttering up my server
with unattended and abandoned sessions.  Would you be so kind as to
CEASE (bam) and DESIST (bam) from this antisocial activity?"  Then
mutter something about too much Jolt Cola and back away slowly.

Don't worry too much about the divots you leave in the desk or the
coffee mug that somehow got shattered.  They'll be useful reminders the
next time he or she considers walking way from a multiplexed screen
session.

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Wed Oct 26 12:42:50 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 26 Oct 2011 12:42:50 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110261210450.25447@lilithnew>
References: <4EA16F3A.8080209@ias.edu>
	<CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>
	<alpine.LFD.2.02.1110261210450.25447@lilithnew>
Message-ID: <4EA8388A.6060704@scalableinformatics.com>

On 10/26/2011 12:22 PM, Robert G. Brown wrote:

> Myself, I like the sucker rod approach.  BANG down on the desk with it
> and say something ominous like "So, you've been cluttering up my server
> with unattended and abandoned sessions.  Would you be so kind as to
> CEASE (bam) and DESIST (bam) from this antisocial activity?"  Then
> mutter something about too much Jolt Cola and back away slowly.

[donning his old New Yawk accent ... "Hey, we don't gots no accent ... 
you'se got an accent..."]

"Thats a nice computer model you have there perfesser ... be a shame to 
have to run it over ... TCP over SLIP (serial line IP) ..."

"So you like that 64 bit math, eh?  Lets see how well you compute with a 
few less bits ..."

[back to your regularly scheduled supercomputer cluster]


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Wed Oct 26 16:55:13 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 26 Oct 2011 16:55:13 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA819DC.9090106@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
Message-ID: <alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>

> sometime, and I've never seen a comment like yours before. You're out of
> line.

hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)

seriously, your question seemed to be about a general problem,
but your motive, ulterior or not, seemed to be to get rid of screen.

IMO, getting rid of screen is BOFHishness of the first order.
it's a tool that has valuable uses.  it's not the cause of your problem.

on our login nodes, we have some basic limits (/etc/security/limit.conf)
that prevent large or long processes or numerous processes.

* hard as 3000000
* hard cpu 60
* hard nproc 100
* hard maxlogins 20

these are very arguable, and actually pretty loose.  our login nodes are
intended for editing/compiling/submitting, maybe the occasional gnuplot/etc.
there doesn't seem to be much resistance to the 3G as (vsz) limit, and 
it does definitely cut down on OOM problems.  60 cpu-minutes covers any
possible compile/etc (though it has caused problems with people trying to
do very large scp operations.)  nproc could probably be much lower (20?)
and maxlogins ought to be more like 5.

we don't currently have an idle-process killer, though have thought of it.
we only recently put a default TMOUT in place to cause a bit of gc on 
forgotten login sessions.

we do have screen installed (I never use it myself.)

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at ur.rochester.edu  Wed Oct 26 17:14:13 2011
From: scrusan at ur.rochester.edu (Steve Crusan)
Date: Wed, 26 Oct 2011 17:14:13 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
Message-ID: <B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 26, 2011, at 4:55 PM, Mark Hahn wrote:

>> sometime, and I've never seen a comment like yours before. You're out of
>> line.
> 
> hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)
> 
> seriously, your question seemed to be about a general problem,
> but your motive, ulterior or not, seemed to be to get rid of screen.
> 
> IMO, getting rid of screen is BOFHishness of the first order.
> it's a tool that has valuable uses.  it's not the cause of your problem.


I agree. 

- From reading this thread, the original machine(s) in question seem to be some sort of interactive or login node(s). If these nodes were large memory or SMP machines, we'd have our resource manager take care of long running processes or other abuses. 


> 
> on our login nodes, we have some basic limits (/etc/security/limit.conf)
> that prevent large or long processes or numerous processes.
> 
> * hard as 3000000
> * hard cpu 60
> * hard nproc 100
> * hard maxlogins 20
> 
> these are very arguable, and actually pretty loose.  our login nodes are
> intended for editing/compiling/submitting, maybe the occasional gnuplot/etc.
> there doesn't seem to be much resistance to the 3G as (vsz) limit, and 
> it does definitely cut down on OOM problems.  60 cpu-minutes covers any
> possible compile/etc (though it has caused problems with people trying to
> do very large scp operations.)  nproc could probably be much lower (20?)
> and maxlogins ought to be more like 5.


We actually just spinned up a graphical login node for our less saavy users whom are more apt to run matlab, comsol, gnuplot, and other 'EZ button' graphically based scientific software. This graphical login software (http://code.google.com/p/neatx/) has helped us a lot with novice users. It has session resumption, client software for any platforms, it's faster than xforwarding, and it's wrapped around SSH. 

The node itself is 'fairly' heavy (8 procs, 72GB of RAM), but we've implemented cgroups to stop abuses. Upon login (through SSH or NX) each user is added to his own control group, which has processor and memory limits. Since the user's processes are kept inside of control group process spaces, it's easy to work directly with their processes/process trees, whether it be dynamic throttling, or just killing processes.   

 On our login nodes that don't use control groups, we just kill any heavy computational processes after a certain period of time, depending on whether or not it's a compilation step, gzip, etc. We state this in our documentation, and usually give the user a warning+grace period. We don't see this type of abuse anymore because the few users whom have done this quickly learned (and apologized, imagine that!), or they were using our cgroup setup login node, so their abuse didn't affect the system enough.

 If the issue is processes that run for far too long, and are abusing the system, cgroups or 'pushing' the users to use a batch system seems to work better than writing scripts to make decisions on killing processes. Most ISVs have methods to run computation in batch mode, so it's not necessary for matlab type users to have their applications running for 3 weeks in a screen session when they could be using the cluster.

Either that, or using some sort of cpu/memory limits that were listed above, or cgroups. So a process can run forever, but it won't have enough CPU/memory shares to make a difference.

Just my .02

> 
> we don't currently have an idle-process killer, though have thought of it.
> we only recently put a default TMOUT in place to cause a bit of gc on 
> forgotten login sessions.
> 
> we do have screen installed (I never use it myself.)
> 
> regards, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOqHgzAAoJENS19LGOpgqKDHQH/AqfAefrt3nusElS/OBnxgBK
Pf8tFuyjoJvLgt+3KX19ZL18r1b/BhdW3/1GZgSVVjQZcYkV6dtUq6VI545jqDag
lRY9kvyIhudKfVhFwGa87DbXSzYv5oDImf3UejsIiJvo20Bzxf7mdpToT+AGJ4gA
J2HzrZwjdZk/DYEJ7CpG9lfthDDq5mrTQTbzVCnFHvEiWpeoBvfd3gJOP94age0F
0ZQGLCgheRSJXLsOlq0y0vqr+7nzupSrLUk5A1YcUysSpk4Dc4mvUVJFE+QbStN6
dSiYHhKMxF5qJTXYOSAF4QDmIObyzlbFFmHCeTTWrCG7KeWtOZU4zUfN7TL3sO4=
=M5Pw
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Thu Oct 27 01:41:47 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 26 Oct 2011 22:41:47 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
Message-ID: <20111027054147.GB29939@bx9.net>

On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote:

> If the issue is processes that run for far too long, and are abusing
> the system, cgroups or 'pushing' the users to use a batch system seems
> to work better than writing scripts to make decisions on killing
> processes.

What I saw work well was nicing the process after a certain time,
including an email, and then killing and emailing after a longer
time. The emails can push the batch alternative. Users generally don't
become angry if the limits are enforced by a script; they can only be
surprised once, and that first time is just nicing the process. If
they have a hard time predicting runtime (a common issue, especially
for non-hardcore supercomputing types), it's not like they
_intentionally_ are exceeding the limits...

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Oct 27 10:49:51 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 27 Oct 2011 10:49:51 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111027054147.GB29939@bx9.net>
References: <4EA16F3A.8080209@ias.edu>
	<4EA505F4.7080007@science-computing.de>	<4EA56C49.9060204@ias.edu>
	<20111025231305.GC9493@bx9.net>	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net>
Message-ID: <4EA96F8F.1010207@ias.edu>


On 10/27/2011 01:41 AM, Greg Lindahl wrote:
> On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote:
>
>> If the issue is processes that run for far too long, and are abusing
>> the system, cgroups or 'pushing' the users to use a batch system seems
>> to work better than writing scripts to make decisions on killing
>> processes.
> What I saw work well was nicing the process after a certain time,
> including an email, and then killing and emailing after a longer
> time. The emails can push the batch alternative. Users generally don't
> become angry if the limits are enforced by a script; they can only be
> surprised once, and that first time is just nicing the process. If
> they have a hard time predicting runtime (a common issue, especially
> for non-hardcore supercomputing types), it's not like they
> _intentionally_ are exceeding the limits...

Exactly. That's why I don't want to automate killing jobs longer than X
days.

Honestly, I can't believe how much controversy this discussion has
created. I thought my OP would go unnoticed. Next time, I'll just ask
which text editor I should use. ;)

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From dnlombar at ichips.intel.com  Thu Oct 27 12:04:21 2011
From: dnlombar at ichips.intel.com (David N. Lombard)
Date: Thu, 27 Oct 2011 09:04:21 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
Message-ID: <20111027160421.GA28306@nlxcldnl2.cl.intel.com>

On Wed, Oct 26, 2011 at 02:55:13PM -0600, Mark Hahn wrote:
> > sometime, and I've never seen a comment like yours before. You're out of
> > line.
> 
> hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)
> 
> seriously, your question seemed to be about a general problem,
> but your motive, ulterior or not, seemed to be to get rid of screen.
> 
> IMO, getting rid of screen is BOFHishness of the first order.
> it's a tool that has valuable uses.  it's not the cause of your problem.

Completely agree with this. If you get rid of screen, another tool will
be used, perhaps even as simple as a private copy, or nohup and tail as
others suggested.

My primary use of screen is to do work across home and the office. Nohup
only solves one of the potential scenarios. If screen were removed, my
productivity would go down.

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From glykos at mbg.duth.gr  Thu Oct 27 15:19:37 2011
From: glykos at mbg.duth.gr (Nicholas M Glykos)
Date: Thu, 27 Oct 2011 22:19:37 +0300 (EEST)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA96F8F.1010207@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
Message-ID: <Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>


> Exactly. That's why I don't want to automate killing jobs longer than X
> days.

Probably irrelevant after so many suggestions, but Caos NSA had this very 
nice 'pam_slurm' module which allows a user to login only to those nodes 
on which the said user has active jobs (allocated through slurm). The 
principal idea ["you are welcome to be bring your allocated node (and, 
thus, your job) to a halt if that's what you want"], sounds pedagogically 
attractive ... ;-)

Nicholas


-- 


          Dr Nicholas M. Glykos, Department of Molecular Biology
     and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Oct 27 15:33:18 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 27 Oct 2011 15:33:18 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
Message-ID: <4EA9B1FE.8090903@ias.edu>

On 10/27/2011 03:19 PM, Nicholas M Glykos wrote:
>
>> Exactly. That's why I don't want to automate killing jobs longer than X
>> days.
> Probably irrelevant after so many suggestions, but Caos NSA had this very 
> nice 'pam_slurm' module which allows a user to login only to those nodes 
> on which the said user has active jobs (allocated through slurm). The 
> principal idea ["you are welcome to be bring your allocated node (and, 
> thus, your job) to a halt if that's what you want"], sounds pedagogically 
> attractive ... ;-)
>
>

This doesn't apply to my case, since access to the systems in question
isn't controlled by a queuing system. That alone would fix the problem.

 I think there's a similar pam module for SGE, too.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Thu Oct 27 15:43:59 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Thu, 27 Oct 2011 21:43:59 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA9B1FE.8090903@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<4EA9B1FE.8090903@ias.edu>
Message-ID: <94F21C03-C8BB-4DB4-AA3A-D1271524E43E@staff.uni-marburg.de>

Am 27.10.2011 um 21:33 schrieb Prentice Bisbal:

> On 10/27/2011 03:19 PM, Nicholas M Glykos wrote:
>> 
>>> Exactly. That's why I don't want to automate killing jobs longer than X
>>> days.
>> Probably irrelevant after so many suggestions, but Caos NSA had this very 
>> nice 'pam_slurm' module which allows a user to login only to those nodes 
>> on which the said user has active jobs (allocated through slurm). The 
>> principal idea ["you are welcome to be bring your allocated node (and, 
>> thus, your job) to a halt if that's what you want"], sounds pedagogically 
>> attractive ... ;-)

They use it in one cluster with Slurm I have access to. But it looks like you are never thrown out again once you are in.

-- Reuti


> This doesn't apply to my case, since access to the systems in question
> isn't controlled by a queuing system. That alone would fix the problem.
> 
> I think there's a similar pam module for SGE, too.
> 
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Thu Oct 27 19:37:29 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 27 Oct 2011 19:37:29 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
Message-ID: <alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>

> nice 'pam_slurm' module which allows a user to login only to those nodes
> on which the said user has active jobs (allocated through slurm). The

I think this is slightly BOFHish, too.  do people actually have problems
with users stealing cycles this way?  the issue is actually stealing,
and we simply tell our users not to steal.  (actually, I don't think we 
even point it out, since it's so obvious!)

that means we don't attempt to control (we had pam_slurm installed and
actually removed it.)  after all, just because a user's job is done, it
doesn't mean the user has no reason to go onto that node (maybe there's a
status file in /tmp, or a core dump or something.)

if someone persisted in stealing cycles, we'd lock their account.

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From skylar at cs.earlham.edu  Thu Oct 27 19:43:24 2011
From: skylar at cs.earlham.edu (Skylar Thompson)
Date: Thu, 27 Oct 2011 16:43:24 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu>
	<4EA505F4.7080007@science-computing.de>	<4EA56C49.9060204@ias.edu>
	<20111025231305.GC9493@bx9.net>	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>	<20111027054147.GB29939@bx9.net>
	<4EA96F8F.1010207@ias.edu>	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
Message-ID: <4EA9EC9C.9090307@cs.earlham.edu>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/27/2011 04:37 PM, Mark Hahn wrote:
>> nice 'pam_slurm' module which allows a user to login only to those nodes
>> on which the said user has active jobs (allocated through slurm). The
>
> I think this is slightly BOFHish, too. do people actually have problems
> with users stealing cycles this way? the issue is actually stealing,
> and we simply tell our users not to steal. (actually, I don't think we
> even point it out, since it's so obvious!)
>
> that means we don't attempt to control (we had pam_slurm installed and
> actually removed it.) after all, just because a user's job is done, it
> doesn't mean the user has no reason to go onto that node (maybe there's a
> status file in /tmp, or a core dump or something.)
>
> if someone persisted in stealing cycles, we'd lock their account.
>

We do the equivalent with GE it if the end user requests it. We have
some clusters that need to support a mix of critical jobs supporting
data pipelines, and less-critical academic work. Our default stance,
though, is to trust our users to do the right thing. Mostly it works,
but sometimes we do need to bring out the LART stick.

- -- 
- --
- -- Skylar Thompson (skylar at cs.earlham.edu)
- -- http://www.cs.earlham.edu/~skylar/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6p7JwACgkQsc4yyULgN4aRdgCbB3er3VI9OZEVSWO0GjL15rgU
Z0sAoIZBKFsCeaYwA44uQT13JcdMN3dz
=ervm
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Oct 28 14:04:02 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 28 Oct 2011 14:04:02 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
Message-ID: <alpine.LFD.2.02.1110281400350.12011@lilithnew>

On Thu, 27 Oct 2011, Mark Hahn wrote:

> if someone persisted in stealing cycles, we'd lock their account.

Exactly.  Or visit them with a sucker rod.  Or have a department chair
have a "talk" with them.

Human to human interactions and controls work better than installing
complex tools or automated constraints.  Sure, sucker rods are a joke
and no we don't actually bop users on the head or the desk or whomp them
upside the head with a manual, but in most cases a stern talking to
followed by locking their account unless/until they formally agree to
change their ways is more than sufficient.

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From sabujp at gmail.com  Fri Oct 28 14:22:03 2011
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Fri, 28 Oct 2011 13:22:03 -0500
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110281400350.12011@lilithnew>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
Message-ID: <CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>

> Human to human interactions and controls work better than installing
> complex tools or automated constraints. ?Sure, sucker rods are a joke
> and no we don't actually bop users on the head or the desk or whomp them
> upside the head with a manual, but in most cases a stern talking to
> followed by locking their account unless/until they formally agree to
> change their ways is more than sufficient.

Funny you should mentioned that, we've got such a device handy, passed
down through the years from previous sysadmins:

http://i.imgur.com/G0pjk.jpg

It's also got a nice foam layer on the bopping side.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From beckerjes at mail.nih.gov  Fri Oct 28 14:27:48 2011
From: beckerjes at mail.nih.gov (Jesse Becker)
Date: Fri, 28 Oct 2011 14:27:48 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
References: <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
Message-ID: <20111028182748.GC41282@mail.nih.gov>

On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
>http://i.imgur.com/G0pjk.jpg
>
>It's also got a nice foam layer on the bopping side.

Then it's just a prop.  What's the *real* one look like?

-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From sabujp at gmail.com  Fri Oct 28 14:33:52 2011
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Fri, 28 Oct 2011 13:33:52 -0500
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111028182748.GC41282@mail.nih.gov>
References: <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
	<20111028182748.GC41282@mail.nih.gov>
Message-ID: <CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>

I don't know, maybe we drop this on their head:

http://i.imgur.com/VWxyF.jpg

or worse, switch out their linux workstation with it.

On Fri, Oct 28, 2011 at 1:27 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
> On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
>>
>> http://i.imgur.com/G0pjk.jpg
>>
>> It's also got a nice foam layer on the bopping side.
>
> Then it's just a prop. ?What's the *real* one look like?
>
> --
> Jesse Becker
> NHGRI Linux support (Digicon Contractor)
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Fri Oct 28 14:58:33 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Fri, 28 Oct 2011 11:58:33 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
References: <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
	<20111028182748.GC41282@mail.nih.gov>
	<CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0110104ABC1A@ALTPHYEMBEVSP20.RES.AD.JPL>

Google "Microsoft we share your pain" and look for the WSYP videos on youtube..  The three minute version is probably the one you want.


Jim Lux
+1(818)354-2075 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sabuj Pattanayek
> Sent: Friday, October 28, 2011 11:34 AM
> To: Beowulf Mailing List
> Subject: Re: [Beowulf] Users abusing screen
> 
> I don't know, maybe we drop this on their head:
> 
> http://i.imgur.com/VWxyF.jpg
> 
> or worse, switch out their linux workstation with it.
> 
> On Fri, Oct 28, 2011 at 1:27 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
> > On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
> >>
> >> http://i.imgur.com/G0pjk.jpg
> >>
> >> It's also got a nice foam layer on the bopping side.
> >
> > Then it's just a prop. ?What's the *real* one look like?
> >
> > --
> > Jesse Becker
> > NHGRI Linux support (Digicon Contractor)
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From glykos at mbg.duth.gr  Fri Oct 28 15:10:18 2011
From: glykos at mbg.duth.gr (Nicholas M Glykos)
Date: Fri, 28 Oct 2011 22:10:18 +0300 (EEST)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110281400350.12011@lilithnew>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
Message-ID: <Pine.LNX.4.62.1110282151540.8545@aspera.cluster.mbg.gr>


> > if someone persisted in stealing cycles, we'd lock their account.
>
> Exactly.  Or visit them with a sucker rod.  Or have a department chair
> have a "talk" with them.
> 
> Human to human interactions and controls work better than installing
> complex tools or automated constraints.
  
I can't, of course, even contemplate the possibility of disagreeing with 
RGB. Having said that, we (humans) do install complex tools and automated 
constraints on each and every technologically advanced piece of equipment, 
from cars and aircrafts, to computing machines (and we do not assume that 
proper training and human interaction suffices to guarantee proper 
operation of the said equipment). In this respect, methods like allocating 
(in a controlled manner) exclusive rights to compute nodes do appear 
sensible. I agree that installing restraints is a balancing act between 
crippling creativity (and making power users mad) and avoiding equipment 
misuse, but clearly, there are limits in the freedom of use (for example, 
you wouldn't add all cluster users to your sudo list).
          
My twocents,
Nicholas  


-- 


          Dr Nicholas M. Glykos, Department of Molecular Biology
     and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 28 16:20:41 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 28 Oct 2011 16:20:41 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
References: <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>	<20111027054147.GB29939@bx9.net>
	<4EA96F8F.1010207@ias.edu>	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>	<alpine.LFD.2.02.1110281400350.12011@lilithnew>	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>	<20111028182748.GC41282@mail.nih.gov>
	<CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
Message-ID: <4EAB0E99.10407@ias.edu>

I was still supporting those only 4 years ago. Much heavier than a Dell
or HP workstation. Will fix 'layer 8' problems in a jiffy.

--
Prentice

On 10/28/2011 02:33 PM, Sabuj Pattanayek wrote:
> I don't know, maybe we drop this on their head:
>
> http://i.imgur.com/VWxyF.jpg
>
> or worse, switch out their linux workstation with it.
>
> On Fri, Oct 28, 2011 at 1:27 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
>> On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
>>> http://i.imgur.com/G0pjk.jpg
>>>
>>> It's also got a nice foam layer on the bopping side.
>> Then it's just a prop.  What's the *real* one look like?
>>
>> --
>> Jesse Becker
>> NHGRI Linux support (Digicon Contractor)
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Fri Oct 28 16:56:49 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 28 Oct 2011 16:56:49 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111027054147.GB29939@bx9.net>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net>
Message-ID: <CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>

I think Greg is right on the money. Particularly at a place like IAS, where
resources are good and users may be errant but are doing great things, I'd
have a sequence of limits; first, a mail warning ("Your job PID 666 has
consumed one million core hours, and its priority will be decremented in
500,000 CH unless you call the sysadmin at 555-1212") and later nice (iwith
another email warning) and only then kill (with an email notificiation). If
they have opportunities to upscale the allocations to really important jobs,
and they are notified about automatic limitations ahead of time, they have
no reason to complain.
Peter

On Thu, Oct 27, 2011 at 1:41 AM, Greg Lindahl <lindahl at pbm.com> wrote:

> On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote:
>
> > If the issue is processes that run for far too long, and are abusing
> > the system, cgroups or 'pushing' the users to use a batch system seems
> > to work better than writing scripts to make decisions on killing
> > processes.
>
> What I saw work well was nicing the process after a certain time,
> including an email, and then killing and emailing after a longer
> time. The emails can push the batch alternative. Users generally don't
> become angry if the limits are enforced by a script; they can only be
> surprised once, and that first time is just nicing the process. If
> they have a hard time predicting runtime (a common issue, especially
> for non-hardcore supercomputing types), it's not like they
> _intentionally_ are exceeding the limits...
>
> -- greg
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111028/5a14781a/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From prentice at ias.edu  Fri Oct 28 18:21:50 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 28 Oct 2011 18:21:50 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<4EA505F4.7080007@science-computing.de>	<4EA56C49.9060204@ias.edu>
	<20111025231305.GC9493@bx9.net>	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>	<20111027054147.GB29939@bx9.net>
	<CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>
Message-ID: <4EAB2AFE.7000901@ias.edu>


On 10/28/2011 04:56 PM, Peter St. John wrote:
> I think Greg is right on the money. Particularly at a place like IAS,
> where resources are good and users may be errant but are doing great
> things,

Have you been a visitor, member or staff member at IAS?

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Fri Oct 28 19:16:44 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 28 Oct 2011 19:16:44 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EAB2AFE.7000901@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net>
	<CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>
	<4EAB2AFE.7000901@ias.edu>
Message-ID: <CAF4H3kfnqGkZHOCHj9kNCFhGVkSsv7P61q1ZH9MDSQrP3_nrPQ@mail.gmail.com>

Prentice,
No, I didin't mean to imply anything specific about e.g. your budget, but
IAS has a fantastic reputation.
Say hi to Dima for me, he plays Go and is an algebraic geometer visiting
this year.
Peter

On Fri, Oct 28, 2011 at 6:21 PM, Prentice Bisbal <prentice at ias.edu> wrote:

>
> On 10/28/2011 04:56 PM, Peter St. John wrote:
> > I think Greg is right on the money. Particularly at a place like IAS,
> > where resources are good and users may be errant but are doing great
> > things,
>
> Have you been a visitor, member or staff member at IAS?
>
> --
> Prentice
>  _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111028/10101bef/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From deadline at eadline.org  Mon Oct  3 08:25:06 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Mon, 3 Oct 2011 08:25:06 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <20110921110239.GR25711@leitl.org>
References: <20110921110239.GR25711@leitl.org>
Message-ID: <59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>

Interesting and pragmatic HPC cloud presentation, worth watching
(25 minutes)

 http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/

--
Doug

>
> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>
> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>
> By Jon Brodkin | Published September 20, 2011 10:49 AM
>
> Amazon EC2 and other cloud services are expanding the market for
> high-performance computing. Without access to a national lab or a
> supercomputer in your own data center, cloud computing lets businesses
> spin
> up temporary clusters at will and stop paying for them as soon as the
> computing needs are met.
>
> A vendor called Cycle Computing is on a mission to demonstrate the
> potential
> of Amazon???s cloud by building increasingly large clusters on the Elastic
> Compute Cloud. Even with Amazon, building a cluster takes some work, but
> Cycle combines several technologies to ease the process and recently used
> them to create a 30,000-core cluster running CentOS Linux.
>
> The cluster, announced publicly this week, was created for an unnamed
> ???Top 5
> Pharma??? customer, and ran for about seven hours at the end of July at a
> peak
> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
> The details are impressive: 3,809 compute instances, each with eight cores
> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
> 256-bit
> AES encryption, and the cluster ran across data centers in three Amazon
> regions in the United States and Europe. The cluster was dubbed
> ???Nekomata.???
>
> Spreading the cluster across multiple continents was done partly for
> disaster
> recovery purposes, and also to guarantee that 30,000 cores could be
> provisioned. ???We thought it would improve our probability of success if
> we
> spread it out,??? Cycle Computing???s Dave Powers, manager of product
> engineering, told Ars. ???Nobody really knows how many instances you can
> get at
> any one time from any one [Amazon] region.???
>
> Amazon offers its own special cluster compute instances, at a higher cost
> than regular-sized virtual machines. These cluster instances provide 10
> Gigabit Ethernet networking along with greater CPU and memory, but they
> weren???t necessary to build the Cycle Computing cluster.
>
> The pharmaceutical company???s job, related to molecular modeling, was
> ???embarrassingly parallel??? so a fast interconnect wasn???t crucial. To
> further
> reduce costs, Cycle took advantage of Amazon???s low-price ???spot
> instances.??? To
> manage the cluster, Cycle Computing used its own management software as
> well
> as the Condor High-Throughput Computing software and Chef, an open source
> systems integration framework.
>
> Cycle demonstrated the power of the Amazon cloud earlier this year with a
> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
> 10,000 cores is a relatively easy task, says Powers. ???We think we???ve
> mastered
> the small-scale environments,??? he said. 30,000 cores isn???t the end
> game,
> either. Going forward, Cycle plans bigger, more complicated clusters,
> perhaps
> ones that will require Amazon???s special cluster compute instances.
>
> The 30,000-core cluster may or may not be the biggest one run on EC2.
> Amazon
> isn???t saying.
>
> ???I can???t share specific customer details, but can tell you that we do
> have
> businesses of all sizes running large-scale, high-performance computing
> workloads on AWS [Amazon Web Services], including distributed clusters
> like
> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
> used for science and engineering applications such as computational fluid
> dynamics and molecular dynamics simulation,??? an Amazon spokesperson told
> Ars.
>
> Amazon itself actually built a supercomputer on its own cloud that made it
> onto the list of the world???s Top 500 supercomputers. With 7,000 cores,
> the
> Amazon cluster ranked number 232 in the world last November with speeds of
> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
> Computing hasn???t run the Linpack benchmark to determine the speed of its
> clusters relative to Top 500 sites.
>
> But Cycle???s work is impressive no matter how you measure it. The job
> performed for the unnamed pharma company ???would take well over a week
> for
> them to run internally,??? Powers says. In the end, the cluster performed
> the
> equivalent of 10.9 ???compute years of work.???
>
> The task of managing such large cloud-based clusters forced Cycle to step
> up
> its own game, with a new plug-in for Chef the company calls Grill.
>
> ???There is no way that any mere human could keep track of all of the
> moving
> parts on a cluster of this scale,??? Cycle wrote in a blog post. ???At
> Cycle,
> we???ve always been fans of extreme IT automation, but we needed to take
> this
> to the next level in order to monitor and manage every instance, volume,
> daemon, job, and so on in order for Nekomata to be an efficient 30,000
> core
> tool instead of a big shiny on-demand paperweight.???
>
> But problems did arise during the 30,000-core run.
>
> ???You can be sure that when you run at massive scale, you are bound to
> run
> into some unexpected gotchas,??? Cycle notes. ???In our case, one of the
> gotchas
> included such things as running out of file descriptors on the license
> server. In hindsight, we should have anticipated this would be an issue,
> but
> we didn???t find that in our prelaunch testing, because we didn???t test
> at full
> scale. We were able to quickly recover from this bump and keep moving
> along
> with the workload with minimal impact. The license server was able to keep
> up
> very nicely with this workload once we increased the number of file
> descriptors.???
>
> Cycle also hit a speed bump related to volume and byte limits on
> Amazon???s
> Elastic Block Store volumes. But the company is already planning bigger
> and
> better things.
>
> ???We already have our next use-case identified and will be turning up the
> scale a bit more with the next run,??? the company says. But ultimately,
> ???it???s
> not about core counts or terabytes of RAM or petabytes of data. Rather,
> it???s
> about how we are helping to transform how science is done.???
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From prentice at ias.edu  Mon Oct  3 13:51:06 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 03 Oct 2011 13:51:06 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>
References: <20110921110239.GR25711@leitl.org>
	<59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>
Message-ID: <4E89F60A.4070801@ias.edu>

Doug,

Thanks for posting that video. It confirmed what I always suspected
about clouds for HPC.


Prentice

On 10/03/2011 08:25 AM, Douglas Eadline wrote:
> Interesting and pragmatic HPC cloud presentation, worth watching
> (25 minutes)
> 
>  http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/
> 
> --
> Doug
> 
>>
>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>
>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>
>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>
>> Amazon EC2 and other cloud services are expanding the market for
>> high-performance computing. Without access to a national lab or a
>> supercomputer in your own data center, cloud computing lets businesses
>> spin
>> up temporary clusters at will and stop paying for them as soon as the
>> computing needs are met.
>>
>> A vendor called Cycle Computing is on a mission to demonstrate the
>> potential
>> of Amazon???s cloud by building increasingly large clusters on the Elastic
>> Compute Cloud. Even with Amazon, building a cluster takes some work, but
>> Cycle combines several technologies to ease the process and recently used
>> them to create a 30,000-core cluster running CentOS Linux.
>>
>> The cluster, announced publicly this week, was created for an unnamed
>> ???Top 5
>> Pharma??? customer, and ran for about seven hours at the end of July at a
>> peak
>> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
>> The details are impressive: 3,809 compute instances, each with eight cores
>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
>> 256-bit
>> AES encryption, and the cluster ran across data centers in three Amazon
>> regions in the United States and Europe. The cluster was dubbed
>> ???Nekomata.???
>>
>> Spreading the cluster across multiple continents was done partly for
>> disaster
>> recovery purposes, and also to guarantee that 30,000 cores could be
>> provisioned. ???We thought it would improve our probability of success if
>> we
>> spread it out,??? Cycle Computing???s Dave Powers, manager of product
>> engineering, told Ars. ???Nobody really knows how many instances you can
>> get at
>> any one time from any one [Amazon] region.???
>>
>> Amazon offers its own special cluster compute instances, at a higher cost
>> than regular-sized virtual machines. These cluster instances provide 10
>> Gigabit Ethernet networking along with greater CPU and memory, but they
>> weren???t necessary to build the Cycle Computing cluster.
>>
>> The pharmaceutical company???s job, related to molecular modeling, was
>> ???embarrassingly parallel??? so a fast interconnect wasn???t crucial. To
>> further
>> reduce costs, Cycle took advantage of Amazon???s low-price ???spot
>> instances.??? To
>> manage the cluster, Cycle Computing used its own management software as
>> well
>> as the Condor High-Throughput Computing software and Chef, an open source
>> systems integration framework.
>>
>> Cycle demonstrated the power of the Amazon cloud earlier this year with a
>> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
>> 10,000 cores is a relatively easy task, says Powers. ???We think we???ve
>> mastered
>> the small-scale environments,??? he said. 30,000 cores isn???t the end
>> game,
>> either. Going forward, Cycle plans bigger, more complicated clusters,
>> perhaps
>> ones that will require Amazon???s special cluster compute instances.
>>
>> The 30,000-core cluster may or may not be the biggest one run on EC2.
>> Amazon
>> isn???t saying.
>>
>> ???I can???t share specific customer details, but can tell you that we do
>> have
>> businesses of all sizes running large-scale, high-performance computing
>> workloads on AWS [Amazon Web Services], including distributed clusters
>> like
>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
>> used for science and engineering applications such as computational fluid
>> dynamics and molecular dynamics simulation,??? an Amazon spokesperson told
>> Ars.
>>
>> Amazon itself actually built a supercomputer on its own cloud that made it
>> onto the list of the world???s Top 500 supercomputers. With 7,000 cores,
>> the
>> Amazon cluster ranked number 232 in the world last November with speeds of
>> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
>> Computing hasn???t run the Linpack benchmark to determine the speed of its
>> clusters relative to Top 500 sites.
>>
>> But Cycle???s work is impressive no matter how you measure it. The job
>> performed for the unnamed pharma company ???would take well over a week
>> for
>> them to run internally,??? Powers says. In the end, the cluster performed
>> the
>> equivalent of 10.9 ???compute years of work.???
>>
>> The task of managing such large cloud-based clusters forced Cycle to step
>> up
>> its own game, with a new plug-in for Chef the company calls Grill.
>>
>> ???There is no way that any mere human could keep track of all of the
>> moving
>> parts on a cluster of this scale,??? Cycle wrote in a blog post. ???At
>> Cycle,
>> we???ve always been fans of extreme IT automation, but we needed to take
>> this
>> to the next level in order to monitor and manage every instance, volume,
>> daemon, job, and so on in order for Nekomata to be an efficient 30,000
>> core
>> tool instead of a big shiny on-demand paperweight.???
>>
>> But problems did arise during the 30,000-core run.
>>
>> ???You can be sure that when you run at massive scale, you are bound to
>> run
>> into some unexpected gotchas,??? Cycle notes. ???In our case, one of the
>> gotchas
>> included such things as running out of file descriptors on the license
>> server. In hindsight, we should have anticipated this would be an issue,
>> but
>> we didn???t find that in our prelaunch testing, because we didn???t test
>> at full
>> scale. We were able to quickly recover from this bump and keep moving
>> along
>> with the workload with minimal impact. The license server was able to keep
>> up
>> very nicely with this workload once we increased the number of file
>> descriptors.???
>>
>> Cycle also hit a speed bump related to volume and byte limits on
>> Amazon???s
>> Elastic Block Store volumes. But the company is already planning bigger
>> and
>> better things.
>>
>> ???We already have our next use-case identified and will be turning up the
>> scale a bit more with the next run,??? the company says. But ultimately,
>> ???it???s
>> not about core counts or terabytes of RAM or petabytes of data. Rather,
>> it???s
>> about how we are helping to transform how science is done.???
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
> 
> 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From deadline at eadline.org  Mon Oct  3 14:17:33 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Mon, 3 Oct 2011 14:17:33 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <4E89F60A.4070801@ias.edu>
References: <20110921110239.GR25711@leitl.org>
	<59677.192.168.93.213.1317644706.squirrel@mail.eadline.org>
	<4E89F60A.4070801@ias.edu>
Message-ID: <58756.192.168.93.213.1317665853.squirrel@mail.eadline.org>


I think everyone has a similar thoughts, but the presentation
provides some real data and experiences.

BTW, for those interested, I have new poll on ClusterMonkey asking
about clouds and HPC. (http://www.clustermonkey.net/)

The last poll was on GP-GPU use.

--
Doug


> Doug,
>
> Thanks for posting that video. It confirmed what I always suspected
> about clouds for HPC.
>
>
> Prentice
>
> On 10/03/2011 08:25 AM, Douglas Eadline wrote:
>> Interesting and pragmatic HPC cloud presentation, worth watching
>> (25 minutes)
>>
>>  http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/
>>
>> --
>> Doug
>>
>>>
>>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>>
>>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>>
>>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>>
>>> Amazon EC2 and other cloud services are expanding the market for
>>> high-performance computing. Without access to a national lab or a
>>> supercomputer in your own data center, cloud computing lets businesses
>>> spin
>>> up temporary clusters at will and stop paying for them as soon as the
>>> computing needs are met.
>>>
>>> A vendor called Cycle Computing is on a mission to demonstrate the
>>> potential
>>> of Amazon???s cloud by building increasingly large clusters on the
>>> Elastic
>>> Compute Cloud. Even with Amazon, building a cluster takes some work,
>>> but
>>> Cycle combines several technologies to ease the process and recently
>>> used
>>> them to create a 30,000-core cluster running CentOS Linux.
>>>
>>> The cluster, announced publicly this week, was created for an unnamed
>>> ???Top 5
>>> Pharma??? customer, and ran for about seven hours at the end of July at
>>> a
>>> peak
>>> cost of $1,279 per hour, including the fees to Amazon and Cycle
>>> Computing.
>>> The details are impressive: 3,809 compute instances, each with eight
>>> cores
>>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
>>> 256-bit
>>> AES encryption, and the cluster ran across data centers in three Amazon
>>> regions in the United States and Europe. The cluster was dubbed
>>> ???Nekomata.???
>>>
>>> Spreading the cluster across multiple continents was done partly for
>>> disaster
>>> recovery purposes, and also to guarantee that 30,000 cores could be
>>> provisioned. ???We thought it would improve our probability of success
>>> if
>>> we
>>> spread it out,??? Cycle Computing???s Dave Powers, manager of product
>>> engineering, told Ars. ???Nobody really knows how many instances you
>>> can
>>> get at
>>> any one time from any one [Amazon] region.???
>>>
>>> Amazon offers its own special cluster compute instances, at a higher
>>> cost
>>> than regular-sized virtual machines. These cluster instances provide 10
>>> Gigabit Ethernet networking along with greater CPU and memory, but they
>>> weren???t necessary to build the Cycle Computing cluster.
>>>
>>> The pharmaceutical company???s job, related to molecular modeling, was
>>> ???embarrassingly parallel??? so a fast interconnect wasn???t crucial.
>>> To
>>> further
>>> reduce costs, Cycle took advantage of Amazon???s low-price ???spot
>>> instances.??? To
>>> manage the cluster, Cycle Computing used its own management software as
>>> well
>>> as the Condor High-Throughput Computing software and Chef, an open
>>> source
>>> systems integration framework.
>>>
>>> Cycle demonstrated the power of the Amazon cloud earlier this year with
>>> a
>>> 10,000-core cluster built for a smaller pharma firm called Genentech.
>>> Now,
>>> 10,000 cores is a relatively easy task, says Powers. ???We think
>>> we???ve
>>> mastered
>>> the small-scale environments,??? he said. 30,000 cores isn???t the end
>>> game,
>>> either. Going forward, Cycle plans bigger, more complicated clusters,
>>> perhaps
>>> ones that will require Amazon???s special cluster compute instances.
>>>
>>> The 30,000-core cluster may or may not be the biggest one run on EC2.
>>> Amazon
>>> isn???t saying.
>>>
>>> ???I can???t share specific customer details, but can tell you that we
>>> do
>>> have
>>> businesses of all sizes running large-scale, high-performance computing
>>> workloads on AWS [Amazon Web Services], including distributed clusters
>>> like
>>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters
>>> often
>>> used for science and engineering applications such as computational
>>> fluid
>>> dynamics and molecular dynamics simulation,??? an Amazon spokesperson
>>> told
>>> Ars.
>>>
>>> Amazon itself actually built a supercomputer on its own cloud that made
>>> it
>>> onto the list of the world???s Top 500 supercomputers. With 7,000
>>> cores,
>>> the
>>> Amazon cluster ranked number 232 in the world last November with speeds
>>> of
>>> 41.82 teraflops, falling to number 451 in June of this year. So far,
>>> Cycle
>>> Computing hasn???t run the Linpack benchmark to determine the speed of
>>> its
>>> clusters relative to Top 500 sites.
>>>
>>> But Cycle???s work is impressive no matter how you measure it. The job
>>> performed for the unnamed pharma company ???would take well over a week
>>> for
>>> them to run internally,??? Powers says. In the end, the cluster
>>> performed
>>> the
>>> equivalent of 10.9 ???compute years of work.???
>>>
>>> The task of managing such large cloud-based clusters forced Cycle to
>>> step
>>> up
>>> its own game, with a new plug-in for Chef the company calls Grill.
>>>
>>> ???There is no way that any mere human could keep track of all of the
>>> moving
>>> parts on a cluster of this scale,??? Cycle wrote in a blog post. ???At
>>> Cycle,
>>> we???ve always been fans of extreme IT automation, but we needed to
>>> take
>>> this
>>> to the next level in order to monitor and manage every instance,
>>> volume,
>>> daemon, job, and so on in order for Nekomata to be an efficient 30,000
>>> core
>>> tool instead of a big shiny on-demand paperweight.???
>>>
>>> But problems did arise during the 30,000-core run.
>>>
>>> ???You can be sure that when you run at massive scale, you are bound to
>>> run
>>> into some unexpected gotchas,??? Cycle notes. ???In our case, one of
>>> the
>>> gotchas
>>> included such things as running out of file descriptors on the license
>>> server. In hindsight, we should have anticipated this would be an
>>> issue,
>>> but
>>> we didn???t find that in our prelaunch testing, because we didn???t
>>> test
>>> at full
>>> scale. We were able to quickly recover from this bump and keep moving
>>> along
>>> with the workload with minimal impact. The license server was able to
>>> keep
>>> up
>>> very nicely with this workload once we increased the number of file
>>> descriptors.???
>>>
>>> Cycle also hit a speed bump related to volume and byte limits on
>>> Amazon???s
>>> Elastic Block Store volumes. But the company is already planning bigger
>>> and
>>> better things.
>>>
>>> ???We already have our next use-case identified and will be turning up
>>> the
>>> scale a bit more with the next run,??? the company says. But
>>> ultimately,
>>> ???it???s
>>> not about core counts or terabytes of RAM or petabytes of data. Rather,
>>> it???s
>>> about how we are helping to transform how science is done.???
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean.
>>>
>>>
>>
>>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at gmail.com  Mon Oct  3 14:50:22 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Mon, 3 Oct 2011 14:50:22 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <20110921110239.GR25711@leitl.org>
References: <20110921110239.GR25711@leitl.org>
Message-ID: <CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>

There's a free & opensource application called StarCluster that can do
most (if not all?) of the EC2 provisioning & cluster setup for a High
Throughput Computing cluster:

http://web.mit.edu/stardev/cluster/

StarCluster sets up NFS, SGE, BLAS library, Open MPI, etc
automatically for the user in around 10-15 mins. StarCluster is
licensed under LGPL, written in Python+Boto, and supports a lot of the
new EC2 features (Cluster Compute Instances, Spot Instances, Cluster
GPU Instances, etc). Support for launching higher node count (100+
instances) clusters is even better with the new scalability
enhancements in the latest version (0.92).

And there are some tutorials on YouTube:

- "StarCluster 0.91 Demo":
http://www.youtube.com/watch?v=vC3lJcPq1FY

- "Launching a Cluster on Amazon Ec2 Spot Instances Using StarCluster":
http://www.youtube.com/watch?v=2Ym7epCYnSk

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


On Wed, Sep 21, 2011 at 7:02 AM, Eugen Leitl <eugen at leitl.org> wrote:
>
> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>
> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>
> By Jon Brodkin | Published September 20, 2011 10:49 AM
>
> Amazon EC2 and other cloud services are expanding the market for
> high-performance computing. Without access to a national lab or a
> supercomputer in your own data center, cloud computing lets businesses spin
> up temporary clusters at will and stop paying for them as soon as the
> computing needs are met.
>
> A vendor called Cycle Computing is on a mission to demonstrate the potential
> of Amazon?s cloud by building increasingly large clusters on the Elastic
> Compute Cloud. Even with Amazon, building a cluster takes some work, but
> Cycle combines several technologies to ease the process and recently used
> them to create a 30,000-core cluster running CentOS Linux.
>
> The cluster, announced publicly this week, was created for an unnamed ?Top 5
> Pharma? customer, and ran for about seven hours at the end of July at a peak
> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
> The details are impressive: 3,809 compute instances, each with eight cores
> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
> (petabytes) of disk space. Security was ensured with HTTPS, SSH and 256-bit
> AES encryption, and the cluster ran across data centers in three Amazon
> regions in the United States and Europe. The cluster was dubbed ?Nekomata.?
>
> Spreading the cluster across multiple continents was done partly for disaster
> recovery purposes, and also to guarantee that 30,000 cores could be
> provisioned. ?We thought it would improve our probability of success if we
> spread it out,? Cycle Computing?s Dave Powers, manager of product
> engineering, told Ars. ?Nobody really knows how many instances you can get at
> any one time from any one [Amazon] region.?
>
> Amazon offers its own special cluster compute instances, at a higher cost
> than regular-sized virtual machines. These cluster instances provide 10
> Gigabit Ethernet networking along with greater CPU and memory, but they
> weren?t necessary to build the Cycle Computing cluster.
>
> The pharmaceutical company?s job, related to molecular modeling, was
> ?embarrassingly parallel? so a fast interconnect wasn?t crucial. To further
> reduce costs, Cycle took advantage of Amazon?s low-price ?spot instances.? To
> manage the cluster, Cycle Computing used its own management software as well
> as the Condor High-Throughput Computing software and Chef, an open source
> systems integration framework.
>
> Cycle demonstrated the power of the Amazon cloud earlier this year with a
> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
> 10,000 cores is a relatively easy task, says Powers. ?We think we?ve mastered
> the small-scale environments,? he said. 30,000 cores isn?t the end game,
> either. Going forward, Cycle plans bigger, more complicated clusters, perhaps
> ones that will require Amazon?s special cluster compute instances.
>
> The 30,000-core cluster may or may not be the biggest one run on EC2. Amazon
> isn?t saying.
>
> ?I can?t share specific customer details, but can tell you that we do have
> businesses of all sizes running large-scale, high-performance computing
> workloads on AWS [Amazon Web Services], including distributed clusters like
> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
> used for science and engineering applications such as computational fluid
> dynamics and molecular dynamics simulation,? an Amazon spokesperson told Ars.
>
> Amazon itself actually built a supercomputer on its own cloud that made it
> onto the list of the world?s Top 500 supercomputers. With 7,000 cores, the
> Amazon cluster ranked number 232 in the world last November with speeds of
> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
> Computing hasn?t run the Linpack benchmark to determine the speed of its
> clusters relative to Top 500 sites.
>
> But Cycle?s work is impressive no matter how you measure it. The job
> performed for the unnamed pharma company ?would take well over a week for
> them to run internally,? Powers says. In the end, the cluster performed the
> equivalent of 10.9 ?compute years of work.?
>
> The task of managing such large cloud-based clusters forced Cycle to step up
> its own game, with a new plug-in for Chef the company calls Grill.
>
> ?There is no way that any mere human could keep track of all of the moving
> parts on a cluster of this scale,? Cycle wrote in a blog post. ?At Cycle,
> we?ve always been fans of extreme IT automation, but we needed to take this
> to the next level in order to monitor and manage every instance, volume,
> daemon, job, and so on in order for Nekomata to be an efficient 30,000 core
> tool instead of a big shiny on-demand paperweight.?
>
> But problems did arise during the 30,000-core run.
>
> ?You can be sure that when you run at massive scale, you are bound to run
> into some unexpected gotchas,? Cycle notes. ?In our case, one of the gotchas
> included such things as running out of file descriptors on the license
> server. In hindsight, we should have anticipated this would be an issue, but
> we didn?t find that in our prelaunch testing, because we didn?t test at full
> scale. We were able to quickly recover from this bump and keep moving along
> with the workload with minimal impact. The license server was able to keep up
> very nicely with this workload once we increased the number of file
> descriptors.?
>
> Cycle also hit a speed bump related to volume and byte limits on Amazon?s
> Elastic Block Store volumes. But the company is already planning bigger and
> better things.
>
> ?We already have our next use-case identified and will be turning up the
> scale a bit more with the next run,? the company says. But ultimately, ?it?s
> not about core counts or terabytes of RAM or petabytes of data. Rather, it?s
> about how we are helping to transform how science is done.?
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Raysonho
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Mon Oct  3 15:21:44 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 3 Oct 2011 15:21:44 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>
References: <20110921110239.GR25711@leitl.org>
	<CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1110031513541.7625@lilith>

On Mon, 3 Oct 2011, Rayson Ho wrote:

> There's a free & opensource application called StarCluster that can do
> most (if not all?) of the EC2 provisioning & cluster setup for a High
> Throughput Computing cluster:

I will say that if anyone is going to make this work, it is going to be
Amazon and/or Google -- they have the very very big pile of computers
needed to make it work.  I would be very interested in seeing the
detailed scaling of "fine grained parallel" applications on cloud
resources -- one point that the talk made that I agree with is that
embarrassingly parallel applications that require minimal I/O or IPCs
will do well in a cloud where all that matters is how many instances you
can run of jobs that don't talk to each other or need much access to
data.  But what of jobs that require synchronous high speed
communications?  What of jobs that require access to huge datasets?

Ultimately the problem comes down to this.  Your choice is to rent time
on somebody else's hardware or buy your own hardware.  For many people,
one can scale to infinity and beyond, so using "all" of the
time/resource you have available either way is a given.  In which case
no matter how you slice it, Amazon or Google have to make a profit above
and beyond the cost of delivering the service.  You don't (or rather,
your "profit" is just the ability to run your jobs and get paid as usual
to do your research either way).  This means that it will always be
cheaper to directly provision a lot of computing rather than run it in
the cloud, or for that matter at an HPC center.  Not all -- lots of
nonlinearities and thresholds associated with infrastructure and admin
and so on -- but a lot.  Enough that I don't see Amazon's Pinky OR the
Brain ever taking over the (HPC) world...

    rgb

>
> http://web.mit.edu/stardev/cluster/
>
> StarCluster sets up NFS, SGE, BLAS library, Open MPI, etc
> automatically for the user in around 10-15 mins. StarCluster is
> licensed under LGPL, written in Python+Boto, and supports a lot of the
> new EC2 features (Cluster Compute Instances, Spot Instances, Cluster
> GPU Instances, etc). Support for launching higher node count (100+
> instances) clusters is even better with the new scalability
> enhancements in the latest version (0.92).
>
> And there are some tutorials on YouTube:
>
> - "StarCluster 0.91 Demo":
> http://www.youtube.com/watch?v=vC3lJcPq1FY
>
> - "Launching a Cluster on Amazon Ec2 Spot Instances Using StarCluster":
> http://www.youtube.com/watch?v=2Ym7epCYnSk
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
>
>
> On Wed, Sep 21, 2011 at 7:02 AM, Eugen Leitl <eugen at leitl.org> wrote:
>>
>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>
>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>
>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>
>> Amazon EC2 and other cloud services are expanding the market for
>> high-performance computing. Without access to a national lab or a
>> supercomputer in your own data center, cloud computing lets businesses spin
>> up temporary clusters at will and stop paying for them as soon as the
>> computing needs are met.
>>
>> A vendor called Cycle Computing is on a mission to demonstrate the potential
>> of Amazon?s cloud by building increasingly large clusters on the Elastic
>> Compute Cloud. Even with Amazon, building a cluster takes some work, but
>> Cycle combines several technologies to ease the process and recently used
>> them to create a 30,000-core cluster running CentOS Linux.
>>
>> The cluster, announced publicly this week, was created for an unnamed ?Top 5
>> Pharma? customer, and ran for about seven hours at the end of July at a peak
>> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing.
>> The details are impressive: 3,809 compute instances, each with eight cores
>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and 256-bit
>> AES encryption, and the cluster ran across data centers in three Amazon
>> regions in the United States and Europe. The cluster was dubbed ?Nekomata.?
>>
>> Spreading the cluster across multiple continents was done partly for disaster
>> recovery purposes, and also to guarantee that 30,000 cores could be
>> provisioned. ?We thought it would improve our probability of success if we
>> spread it out,? Cycle Computing?s Dave Powers, manager of product
>> engineering, told Ars. ?Nobody really knows how many instances you can get at
>> any one time from any one [Amazon] region.?
>>
>> Amazon offers its own special cluster compute instances, at a higher cost
>> than regular-sized virtual machines. These cluster instances provide 10
>> Gigabit Ethernet networking along with greater CPU and memory, but they
>> weren?t necessary to build the Cycle Computing cluster.
>>
>> The pharmaceutical company?s job, related to molecular modeling, was
>> ?embarrassingly parallel? so a fast interconnect wasn?t crucial. To further
>> reduce costs, Cycle took advantage of Amazon?s low-price ?spot instances.? To
>> manage the cluster, Cycle Computing used its own management software as well
>> as the Condor High-Throughput Computing software and Chef, an open source
>> systems integration framework.
>>
>> Cycle demonstrated the power of the Amazon cloud earlier this year with a
>> 10,000-core cluster built for a smaller pharma firm called Genentech. Now,
>> 10,000 cores is a relatively easy task, says Powers. ?We think we?ve mastered
>> the small-scale environments,? he said. 30,000 cores isn?t the end game,
>> either. Going forward, Cycle plans bigger, more complicated clusters, perhaps
>> ones that will require Amazon?s special cluster compute instances.
>>
>> The 30,000-core cluster may or may not be the biggest one run on EC2. Amazon
>> isn?t saying.
>>
>> ?I can?t share specific customer details, but can tell you that we do have
>> businesses of all sizes running large-scale, high-performance computing
>> workloads on AWS [Amazon Web Services], including distributed clusters like
>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
>> used for science and engineering applications such as computational fluid
>> dynamics and molecular dynamics simulation,? an Amazon spokesperson told Ars.
>>
>> Amazon itself actually built a supercomputer on its own cloud that made it
>> onto the list of the world?s Top 500 supercomputers. With 7,000 cores, the
>> Amazon cluster ranked number 232 in the world last November with speeds of
>> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle
>> Computing hasn?t run the Linpack benchmark to determine the speed of its
>> clusters relative to Top 500 sites.
>>
>> But Cycle?s work is impressive no matter how you measure it. The job
>> performed for the unnamed pharma company ?would take well over a week for
>> them to run internally,? Powers says. In the end, the cluster performed the
>> equivalent of 10.9 ?compute years of work.?
>>
>> The task of managing such large cloud-based clusters forced Cycle to step up
>> its own game, with a new plug-in for Chef the company calls Grill.
>>
>> ?There is no way that any mere human could keep track of all of the moving
>> parts on a cluster of this scale,? Cycle wrote in a blog post. ?At Cycle,
>> we?ve always been fans of extreme IT automation, but we needed to take this
>> to the next level in order to monitor and manage every instance, volume,
>> daemon, job, and so on in order for Nekomata to be an efficient 30,000 core
>> tool instead of a big shiny on-demand paperweight.?
>>
>> But problems did arise during the 30,000-core run.
>>
>> ?You can be sure that when you run at massive scale, you are bound to run
>> into some unexpected gotchas,? Cycle notes. ?In our case, one of the gotchas
>> included such things as running out of file descriptors on the license
>> server. In hindsight, we should have anticipated this would be an issue, but
>> we didn?t find that in our prelaunch testing, because we didn?t test at full
>> scale. We were able to quickly recover from this bump and keep moving along
>> with the workload with minimal impact. The license server was able to keep up
>> very nicely with this workload once we increased the number of file
>> descriptors.?
>>
>> Cycle also hit a speed bump related to volume and byte limits on Amazon?s
>> Elastic Block Store volumes. But the company is already planning bigger and
>> better things.
>>
>> ?We already have our next use-case identified and will be turning up the
>> scale a bit more with the next run,? the company says. But ultimately, ?it?s
>> not about core counts or terabytes of RAM or petabytes of data. Rather, it?s
>> about how we are helping to transform how science is done.?
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
>
> -- 
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Wikimedia Commons
> http://commons.wikimedia.org/wiki/User:Raysonho
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From raysonlogin at gmail.com  Tue Oct  4 10:55:39 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 4 Oct 2011 10:55:39 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110031513541.7625@lilith>
References: <20110921110239.GR25711@leitl.org>
	<CAHwLALPdRwcT-xj-sW=-ADt3Pg6QdXoaAMe+ANmhOZXLfMuFpw@mail.gmail.com>
	<alpine.LFD.2.02.1110031513541.7625@lilith>
Message-ID: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>

On Mon, Oct 3, 2011 at 3:21 PM, Robert G. Brown <rgb at phy.duke.edu> wrote:
>?I would be very interested in seeing the
> detailed scaling of "fine grained parallel" applications on cloud
> resources -- one point that the talk made that I agree with is that
> embarrassingly parallel applications that require minimal I/O or IPCs
> will do well in a cloud where all that matters is how many instances you
> can run of jobs that don't talk to each other or need much access to
> data. ?But what of jobs that require synchronous high speed
> communications?

Amazon (and I believe other cloud providers have something similar?)
introduced Cluster Compute Instances with 10 Gb Ethernet. For
traditional MPI workloads, the real advantage is actually from HVM
(Hardware VM), as it cuts the communication latency by quite a lot.


> What of jobs that require access to huge datasets?

Getting data in & out of the cloud is still a big problem, and the
highest bandwidth way of sending data to AWS is by FedEx. In fact, it
is quite often that the fastest way to send data from one data center
to another when the data size is big.


And processing data on the cloud is easier (in terms of setup) with
Amazon Elastic MapReduce (and recently works with spot instances).

http://aws.amazon.com/elasticmapreduce/


> Ultimately the problem comes down to this. ?Your choice is to rent time
> on somebody else's hardware or buy your own hardware. ?For many people,
> one can scale to infinity and beyond, so using "all" of the
> time/resource you have available either way is a given. ?In which case
> no matter how you slice it, Amazon or Google have to make a profit above
> and beyond the cost of delivering the service. ?You don't (or rather,
> your "profit" is just the ability to run your jobs and get paid as usual
> to do your research either way). ?This means that it will always be
> cheaper to directly provision a lot of computing rather than run it in
> the cloud, or for that matter at an HPC center.

Provided that the machines are used 24x7. A lot of enterprise users do
not have enough work to load up the machines. Eg, I worked with a
client that has lots of data & numbers to crunch at night, and during
day time most of the machines are idle.

For traditional HPC centers, the batch queue length is almost never 0,
then agreed, cloud wouldn't help or even makes the problem worse.

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


> ?Not all -- lots of
> nonlinearities and thresholds associated with infrastructure and admin
> and so on -- but a lot. ?Enough that I don't see Amazon's Pinky OR the
> Brain ever taking over the (HPC) world...
>
> ? rgb
>
>>
>> http://web.mit.edu/stardev/cluster/
>>
>> StarCluster sets up NFS, SGE, BLAS library, Open MPI, etc
>> automatically for the user in around 10-15 mins. StarCluster is
>> licensed under LGPL, written in Python+Boto, and supports a lot of the
>> new EC2 features (Cluster Compute Instances, Spot Instances, Cluster
>> GPU Instances, etc). Support for launching higher node count (100+
>> instances) clusters is even better with the new scalability
>> enhancements in the latest version (0.92).
>>
>> And there are some tutorials on YouTube:
>>
>> - "StarCluster 0.91 Demo":
>> http://www.youtube.com/watch?v=vC3lJcPq1FY
>>
>> - "Launching a Cluster on Amazon Ec2 Spot Instances Using StarCluster":
>> http://www.youtube.com/watch?v=2Ym7epCYnSk
>>
>> Rayson
>>
>> =================================
>> Grid Engine / Open Grid Scheduler
>> http://gridscheduler.sourceforge.net
>>
>>
>>
>> On Wed, Sep 21, 2011 at 7:02 AM, Eugen Leitl <eugen at leitl.org> wrote:
>>>
>>>
>>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
>>>
>>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud
>>>
>>> By Jon Brodkin | Published September 20, 2011 10:49 AM
>>>
>>> Amazon EC2 and other cloud services are expanding the market for
>>> high-performance computing. Without access to a national lab or a
>>> supercomputer in your own data center, cloud computing lets businesses
>>> spin
>>> up temporary clusters at will and stop paying for them as soon as the
>>> computing needs are met.
>>>
>>> A vendor called Cycle Computing is on a mission to demonstrate the
>>> potential
>>> of Amazon?s cloud by building increasingly large clusters on the Elastic
>>> Compute Cloud. Even with Amazon, building a cluster takes some work, but
>>> Cycle combines several technologies to ease the process and recently used
>>> them to create a 30,000-core cluster running CentOS Linux.
>>>
>>> The cluster, announced publicly this week, was created for an unnamed
>>> ?Top 5
>>> Pharma? customer, and ran for about seven hours at the end of July at a
>>> peak
>>> cost of $1,279 per hour, including the fees to Amazon and Cycle
>>> Computing.
>>> The details are impressive: 3,809 compute instances, each with eight
>>> cores
>>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB
>>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and
>>> 256-bit
>>> AES encryption, and the cluster ran across data centers in three Amazon
>>> regions in the United States and Europe. The cluster was dubbed
>>> ?Nekomata.?
>>>
>>> Spreading the cluster across multiple continents was done partly for
>>> disaster
>>> recovery purposes, and also to guarantee that 30,000 cores could be
>>> provisioned. ?We thought it would improve our probability of success if
>>> we
>>> spread it out,? Cycle Computing?s Dave Powers, manager of product
>>> engineering, told Ars. ?Nobody really knows how many instances you can
>>> get at
>>> any one time from any one [Amazon] region.?
>>>
>>> Amazon offers its own special cluster compute instances, at a higher cost
>>> than regular-sized virtual machines. These cluster instances provide 10
>>> Gigabit Ethernet networking along with greater CPU and memory, but they
>>> weren?t necessary to build the Cycle Computing cluster.
>>>
>>> The pharmaceutical company?s job, related to molecular modeling, was
>>> ?embarrassingly parallel? so a fast interconnect wasn?t crucial. To
>>> further
>>> reduce costs, Cycle took advantage of Amazon?s low-price ?spot
>>> instances.? To
>>> manage the cluster, Cycle Computing used its own management software as
>>> well
>>> as the Condor High-Throughput Computing software and Chef, an open source
>>> systems integration framework.
>>>
>>> Cycle demonstrated the power of the Amazon cloud earlier this year with a
>>> 10,000-core cluster built for a smaller pharma firm called Genentech.
>>> Now,
>>> 10,000 cores is a relatively easy task, says Powers. ?We think we?ve
>>> mastered
>>> the small-scale environments,? he said. 30,000 cores isn?t the end game,
>>> either. Going forward, Cycle plans bigger, more complicated clusters,
>>> perhaps
>>> ones that will require Amazon?s special cluster compute instances.
>>>
>>> The 30,000-core cluster may or may not be the biggest one run on EC2.
>>> Amazon
>>> isn?t saying.
>>>
>>> ?I can?t share specific customer details, but can tell you that we do
>>> have
>>> businesses of all sizes running large-scale, high-performance computing
>>> workloads on AWS [Amazon Web Services], including distributed clusters
>>> like
>>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often
>>> used for science and engineering applications such as computational fluid
>>> dynamics and molecular dynamics simulation,? an Amazon spokesperson told
>>> Ars.
>>>
>>> Amazon itself actually built a supercomputer on its own cloud that made
>>> it
>>> onto the list of the world?s Top 500 supercomputers. With 7,000 cores,
>>> the
>>> Amazon cluster ranked number 232 in the world last November with speeds
>>> of
>>> 41.82 teraflops, falling to number 451 in June of this year. So far,
>>> Cycle
>>> Computing hasn?t run the Linpack benchmark to determine the speed of its
>>> clusters relative to Top 500 sites.
>>>
>>> But Cycle?s work is impressive no matter how you measure it. The job
>>> performed for the unnamed pharma company ?would take well over a week for
>>> them to run internally,? Powers says. In the end, the cluster performed
>>> the
>>> equivalent of 10.9 ?compute years of work.?
>>>
>>> The task of managing such large cloud-based clusters forced Cycle to step
>>> up
>>> its own game, with a new plug-in for Chef the company calls Grill.
>>>
>>> ?There is no way that any mere human could keep track of all of the
>>> moving
>>> parts on a cluster of this scale,? Cycle wrote in a blog post. ?At Cycle,
>>> we?ve always been fans of extreme IT automation, but we needed to take
>>> this
>>> to the next level in order to monitor and manage every instance, volume,
>>> daemon, job, and so on in order for Nekomata to be an efficient 30,000
>>> core
>>> tool instead of a big shiny on-demand paperweight.?
>>>
>>> But problems did arise during the 30,000-core run.
>>>
>>> ?You can be sure that when you run at massive scale, you are bound to run
>>> into some unexpected gotchas,? Cycle notes. ?In our case, one of the
>>> gotchas
>>> included such things as running out of file descriptors on the license
>>> server. In hindsight, we should have anticipated this would be an issue,
>>> but
>>> we didn?t find that in our prelaunch testing, because we didn?t test at
>>> full
>>> scale. We were able to quickly recover from this bump and keep moving
>>> along
>>> with the workload with minimal impact. The license server was able to
>>> keep up
>>> very nicely with this workload once we increased the number of file
>>> descriptors.?
>>>
>>> Cycle also hit a speed bump related to volume and byte limits on Amazon?s
>>> Elastic Block Store volumes. But the company is already planning bigger
>>> and
>>> better things.
>>>
>>> ?We already have our next use-case identified and will be turning up the
>>> scale a bit more with the next run,? the company says. But ultimately,
>>> ?it?s
>>> not about core counts or terabytes of RAM or petabytes of data. Rather,
>>> it?s
>>> about how we are helping to transform how science is done.?
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>>
>> --
>> Rayson
>>
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> Wikimedia Commons
>> http://commons.wikimedia.org/wiki/User:Raysonho
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> Robert G. Brown ? ? ? ? ? ? ? ? ? ? ? ?http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 ?Fax: 919-660-2525 ? ? email:rgb at phy.duke.edu
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Tue Oct  4 11:26:55 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Tue, 4 Oct 2011 08:26:55 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
Message-ID: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>


On 10/4/11 7:55 AM, "Rayson Ho" <raysonlogin at gmail.com> wrote:

>On Mon, Oct 3, 2011 at 3:21 PM, Robert G. Brown <rgb at phy.duke.edu> wrote:
>> I would be very interested in seeing the
>> detailed scaling of "fine grained parallel" applications on cloud
>> resources -- one point that the talk made that I agree with is that
>> embarrassingly parallel applications that require minimal I/O or IPCs
>> will do well in a cloud where all that matters is how many instances you
>> can run of jobs that don't talk to each other or need much access to
>> data.  But what of jobs that require synchronous high speed
>> communications?
>
>Amazon (and I believe other cloud providers have something similar?)
>introduced Cluster Compute Instances with 10 Gb Ethernet. For
>traditional MPI workloads, the real advantage is actually from HVM
>(Hardware VM), as it cuts the communication latency by quite a lot.
>
>
>> What of jobs that require access to huge datasets?
>
>Getting data in & out of the cloud is still a big problem, and the
>highest bandwidth way of sending data to AWS is by FedEx. In fact, it
>is quite often that the fastest way to send data from one data center
>to another when the data size is big.

The classic: nothing beats a station wagon full of tapes for bandwidth.
(today, it's minivan with terabyte hard drives, but that's the idea)
>
>
>
>> Ultimately the problem comes down to this.  Your choice is to rent time
>> on somebody else's hardware or buy your own hardware.  For many people,
>> one can scale to infinity and beyond, so using "all" of the
>> time/resource you have available either way is a given.  In which case
>> no matter how you slice it, Amazon or Google have to make a profit above
>> and beyond the cost of delivering the service.  You don't (or rather,
>> your "profit" is just the ability to run your jobs and get paid as usual
>> to do your research either way).  This means that it will always be
>> cheaper to directly provision a lot of computing rather than run it in
>> the cloud, or for that matter at an HPC center.
>
>Provided that the machines are used 24x7. A lot of enterprise users do
>not have enough work to load up the machines. Eg, I worked with a
>client that has lots of data & numbers to crunch at night, and during
>day time most of the machines are idle.

In a situation where you've got an existing application and data, and you
just want to crunch numbers, and you pay either cloud or in-house, then
you make the choice based on the incremental cost.

However, even at the smallest increment on a cloud/hosted scheme, you have
to pay from CPU second #1 (plus the fixed overhead of getting the job
ready to go).

If you have a cluster in house, there is likely a way to get a test job
run essentially for free (perhaps on an older non-production cluster).
That test job provides the performance data and preliminary results that
you use in preparing the proposal to get real money to pay for real
computation.

This has been my argument for personal clusters... There's no accounting
staff or administrative person watching over you to make sure you are
effectively using the capital investment, in the same sense that most
places don't care how much idle time there is on your desktop PC.  If
you've got an idea, and you're willing to put your own time (free?) into
it, using the box that happens to be in your office or lab, nobody cares
one way or another, as long as your primary job gets done.
Notwithstanding that there ARE places that do cycle harvesting from
desktop machines, but the management and sysadmin hassles are so extreme
(I've written software to DO such harvesting, in pre-Beowulf days).. Those
kinds of places go to thin clients and hosted VM instances eventually, I
think.


Where an Amazon could do themselves a favor (maybe they do this already)
is to provide a free downloadable version of their environment for your
own computer, or some "low priority cycles" for free, to get people
hooked.  Sort of like IBM providing computers for cheap to universities in
the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
cellphones, 10 cent text messages. Give us your child 'til 7, and he's
ours for life.


>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From raysonlogin at gmail.com  Tue Oct  4 11:58:12 2011
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 4 Oct 2011 11:58:12 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
Message-ID: <CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>

On Tue, Oct 4, 2011 at 11:26 AM, Lux, Jim (337C)
<james.p.lux at jpl.nasa.gov> wrote:
> The classic: nothing beats a station wagon full of tapes for bandwidth.
> (today, it's minivan with terabyte hard drives, but that's the idea)

BTW, I've heard horror stories related to routing errors with this
method - truck drivers delivering wrong tapes or losing tapes
(hopefully the data is properly encrypted).


> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).

The technology part of cycle harvesting is solvable, the accounting
part is (IMO) much harder.

A few years ago I talked to a University HPC lab about deploying cycle
harvesting in the libraries (it's a big University, so we are talking
about 1000+ library desktops). The technology was there (BOINC
client), but getting the software installed & maintained means extra
work, which means an extra IT guy... and means no one wants to pay for
this.

I wonder how many University labs or Biotech companies are doing
organization wide cycle harvesting these days, for example, with
technologies like BOINC:

http://boinc.berkeley.edu/


> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer,

AMI is not private (in the end, it is IaaS, so the VM images are open).

In fact, StarCluster has AMIs for download & install (mainly for
developers who want to code for StarCluster locally):

http://web.mit.edu/stardev/cluster/download_amis.html


And one can roll a custom StarCluster AMI and upload it to AWS, such
that the image settings are optimized to the needs:

http://web.mit.edu/stardev/cluster/docs/0.91/create_new_ami.html


> or some "low priority cycles" for free, to get people hooked.

AWS Free Usage Tier -- (most people just use the free tier as free hosting):

http://aws.amazon.com/free/

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


> ?Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.
>
>
>>
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Tue Oct  4 13:08:11 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Tue, 4 Oct 2011 13:08:11 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
Message-ID: <53556.192.168.93.213.1317748091.squirrel@mail.eadline.org>


--snip--
>
> This has been my argument for personal clusters... There's no accounting
> staff or administrative person watching over you to make sure you are
> effectively using the capital investment, in the same sense that most
> places don't care how much idle time there is on your desktop PC.  If
> you've got an idea, and you're willing to put your own time (free?) into
> it, using the box that happens to be in your office or lab, nobody cares
> one way or another, as long as your primary job gets done.
> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).. Those
> kinds of places go to thin clients and hosted VM instances eventually, I
> think.

BTW, very soon  prebuilt Limulus systems will be available
(http://limulus.basement-supercomputing.com) with 16 cores
(four i5-2500S processors), one power plug, cool, quiet,
with cool blue lights to impress your co-workers.

--
Doug

>
>
> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer, or some "low priority cycles" for free, to get people
> hooked.  Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.
>
>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Oct  4 14:39:20 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 4 Oct 2011 14:39:20 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
Message-ID: <alpine.LFD.2.02.1110041414180.14257@lilith>

On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:

> Notwithstanding that there ARE places that do cycle harvesting from
> desktop machines, but the management and sysadmin hassles are so extreme
> (I've written software to DO such harvesting, in pre-Beowulf days).. Those
> kinds of places go to thin clients and hosted VM instances eventually, I
> think.

Condor (much improved from the old days, I think) actually makes this
fairly easy nowadays.  The physics department runs condor across lots of
the low-rent desktop systems, creating a readily available compute farm
for EP jobs.

I don't do much of that sort of thing any more, alas.  Mostly teaching,
working on dieharder when I can, and writing textbooks at a furious
pace.  I will have a complete first year physics textbook -- the world's
best, naturally;-) -- finished by the end of this semester (I'm within
about four and a half chapters of finished already, and writing at least
a chapter a week at this point).

After that is done, and two other books that are partly finished (three
if I get really inspired and try to finish the beowulf book) THEN I may
have time to do more actual computing.

> Where an Amazon could do themselves a favor (maybe they do this already)
> is to provide a free downloadable version of their environment for your
> own computer, or some "low priority cycles" for free, to get people
> hooked.  Sort of like IBM providing computers for cheap to universities in
> the 60s and 70s. Razors, razor blades. Kindles, e-books. Subsidized
> cellphones, 10 cent text messages. Give us your child 'til 7, and he's
> ours for life.

As I said, ultimately Amazon makes a profit.  That is, they provide the
cluster and some reasonable subset of cluster management in
infrastructure provisioning, where they have to a) recoup the cost of
the hardware, the infrastructure, and the management; b) make at LEAST
5-10% or better on the costs of all of this as profit, if not more like
40-50% or even 100% markup.  Usually retail is 100% markup, but Amazon
has scale efficiencies such that they can get by with less, whether or
not they "like" to.

So it ultimately comes down to whether or not you can provide similar
efficiencies in your own local environment.  Suppose it is a University.
You have $100,000 for a compute resource that you expect to use over
three years.

There is typically no indirect cost charged to capital equipment.
Often, but not always, housing, cooling, powering, and even managing the
hardware is "free" to the researcher, absorbed into the ongoing costs of
the server room and management staff already needed to run the
department LAN and servers.  Thus for your $100,000 you can buy (say)
100 dedicated function systems for $1000 each and everything else is
paid out of opportunity cost labor or University provisioning that
doesn't cost your grant anything -- out of that $100,000 (although of
course your indirect costs elsewhere partly subsidize it).  Even network
ports may be free, or may not be if you need a higher end "cluster"
network.

If you rent from ANYBODY, you pay:

   * Slightly over 1/3 of the $100,000 up front for indirect costs.
Duke, for example, would be perfectly happy to charge your grant $1 for
every $2 that it pays out to a third party for cloud computing rental.
For that fee they do all of the bookkeeping, basically -- most is pure
profit, but prenegotiated with all of the granting agencies and that's
just the way it is.

   * Your remaining (say) $63,000 has to pay for (a fraction of) the
power, the housing, the cooling, the network.  Unless Amazon subsidizes
the cluster with different money altogether (e.g. using money from book
sales to provide all of this at a loss) it will almost certainly not be
as cheap as a University center for modest size clusters.  When clusters
grow to where people have to build new data centers just to house them,
of course, this may not be true (but Amazon still doesn't gain much of a
relative advantage even in this extreme case, not in the long run).
Infrastructure costs are likely ballpark 10% of the cost of the hardware
you are running on.

   * It has to pay for Amazon's sysadmins and management and security.
These are humans that your money DIRECTLY supports, not humans that are
directly supported to do something else and do admin for you on an
opportunity cost basis "for free".  Real salaries, (fractionally) paid
from this income stream only.  Even amortized in the friendliest most
favorable way possible, admin cost are probably at least 10% of the
hardware costs.

   * Profit.  At least (say) $6300 is profit.  Nobody makes a similar
profit in the case of the DIY cluster.

   * The amortized cost of the hardware.

The way I see it, you end up with roughly 50% of every dollar lost >>off
the top<< of your $100,000.  You ultimately buy (an amortized fraction
of) the hardware the $100,000 as up-front capital equipment would cost
you, and instead of being able to leverage pre-existing University
infrastructure, avoid indirect costs, all as on a non-profit basis, you
have to pay for infrastructure, indirect costs on the grant, management,
AND A PROFIT on top of the hardware.

The only real advantage is that -- maybe -- Amazon has market leverage
and economy of scale on the hardware.  But 50%?  That's hard to make
back.

    rgb

>
>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From dag at sonsorol.org  Tue Oct  4 15:29:28 2011
From: dag at sonsorol.org (Chris Dagdigian)
Date: Tue, 04 Oct 2011 15:29:28 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041414180.14257@lilith>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<alpine.LFD.2.02.1110041414180.14257@lilith>
Message-ID: <4E8B5E98.3090002@sonsorol.org>


I'm largely with RGB on this one with the minor caveat that I think he 
might be undervaluing the insane economies of scale that IaaS providers 
like Amazon & Google can provide.

At the scale that Amazon operates at, they can obtain and run 
infrastructure far, far more efficiently than most (if not all) of us 
can ourselves. These folks have exabytes of spinning disk, redundant 
data-centers (with insane PUE values) all over the world and they know 
how to manage hundreds of thousands of servers with high efficiency in a 
very hostile networking environment. Not only can they run bigger and 
more efficient than we can, they can charge a price that makes them a 
profit while still being (in many cases) far cheaper than my own costs 
should I be truly honest about the fully-loaded costs of maintaining HPC 
or IT services.

AWS has a history of lowering prices as their own costs go down. You can 
see this via the EC2 pricing history as well as the now-down-to-zero 
cost of inbound data transit.

AWS Spot market makes this even more interesting. I can currently run an 
m1.4xlarge 64bit server instance with 15GB RAM for about $.24 per hour - 
close to 50% cheaper than the published hourly price and that spot price 
can hold steady for weeks at a time in many cases.

The biggest hangup is the economics. Even harder in an academic 
environment where researchers are used to seeing their funds vanish to 
"overhead" on their grant or they just assume that datacenters, 
bandwidth, power and hosting are all "free" to use.

It's hard to do true cost comparisons but time and time again I've seen 
IaaS come out ahead when the fully-loaded costs are actually put down on 
paper.

Here is a cliche example: Amazon S3

Before the S3 object storage service will even *acknowledge* a 
successful PUT request, your file is already at rest in at least three 
amazon facilities.

So to "really" compare S3 against what you can do locally you at least 
have to factor in the cost of your organization being able to provide 3x 
multi-facility replication for whatever object store you choose to deploy...

I don't want to be seen as a shill so I'll stop with that example. The 
results really are surprising once you start down the "true cost of IT 
services..." road.


As for industry trends with HPC and IaaS ...

I can assure you that in the super practical & cynical world of biotech 
and pharma there is already an HPC migration to IaaS platforms that is 
years old already. It's a lot easier to see where and how your money is 
being spent inside a biotech startup or pharma and that is (and has) 
shunted a decent amount of spending towards cloud platforms.

The easy stuff is moving to IaaS platforms. The hard stuff, the custom 
stuff, the tightly bound stuff and the data/IO-bound stuff is staying 
local of course - but that still means lots of stuff is moving externally.

The article that prompted this thread is a great example of this. The 
client company had a boatload of one-off molecular dynamics simulations 
to run. So much, in fact, that the problem was computationally 
infeasable to even consider doing inhouse.

So they did it on AWS.

30,000 CPU cores. For ~$9,000 dollars.

Amazing.

It's a fun time to be in HPC actually. And getting my head around "IaaS" 
platforms turned me onto things (like opscode chef) that we are now 
bringing inhouse and integrating into our legacy clusters and grids.


Sorry for rambling but I think there are 2 main drivers behind what I 
see moving HPC users and applications into IaaS cloud platforms ...


(1) The economies of scale are real. IaaS providers can run better, 
bigger and cheaper than we can and they can still make a profit. This is 
real, not hype or sales BS. (as long as you are honest about your actual 
costs...)


(2) The benefits of "scriptable everything" or "everything has an API". 
I'm so freaking sick of companies installing VMWare and excreting a 
press release calling themselves a "cloud provider". Virtual servers and 
virtual block storage on demand are boring, basic and pedestrian. That 
was clever in 2004. I need far more "glue" to build useful stuff in a 
virtual world and IaaS platforms deliver more products/services and 
"glue" options than anyone else out there. The "scriptable everything" 
nature of IaaS is enabling a lot of cool system and workflow building, 
much of which would be hard or almost impossible to do in-house with 
local resources.


My $.02

-Chris

(corporate hat: chris at bioteam.net)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Tue Oct  4 16:07:21 2011
From: mathog at caltech.edu (mathog)
Date: Tue, 04 Oct 2011 13:07:21 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
	000-core cluster built on Amazon EC2 cloud
In-Reply-To: <mailman.1.1317754801.13041.beowulf@beowulf.org>
References: <mailman.1.1317754801.13041.beowulf@beowulf.org>
Message-ID: <d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>

> "Robert G. Brown" <rgb at phy.duke.edu> wrote:

> Often, but not always, housing, cooling, powering, and even managing 
> the
> hardware is "free" to the researcher, absorbed into the ongoing costs 
> of
> the server room and management staff already needed to run the
> department LAN and servers.

Not always indeed.  My little machine room houses a half dozen machines 
from other biology division
people, and they are not charged to keep them there.  However, putting 
a computer in the central
campus machine rooms is not free.  And new computer rooms, at least 
those of any size, do not
get free power.  After geology put in this monster:

   http://www.gps.caltech.edu/uploads/Image/Facilities/Beowulf.jpg

the administration decided that when a computer room pretty much needs 
its own
substation, it is well beyond the incidental overhead costs they are 
willing to
pick up for average research labs.

Along similar lines, I would guess that SLAC has to pay for its own 
power, rather than
Stanford covering it out of overhead.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Tue Oct  4 16:39:16 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 4 Oct 2011 16:39:16 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1110041611120.14257@lilith>

On Tue, 4 Oct 2011, Chi Chan wrote:

> On Tue, Oct 4, 2011 at 11:58 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
>> BTW, I've heard horror stories related to routing errors with this
>> method - truck drivers delivering wrong tapes or losing tapes
>> (hopefully the data is properly encrypted).
>
> I just read this on Slashdot today, it is "very hard to encrypt a
> backup tape" (really?):
>
> http://yro.slashdot.org/story/11/10/04/1815256/saic-loses-data-of-49-million-patients

Not if it is encrypted with a stream cipher -- a stream cipher basically
xors the data with a bitstream generated from a suitable key in a
cryptographic-strength pseudorandom number generator (although there are
variations on this theme).  As a result, it can be quite fast -- as fast
as generating pseudorandom numbers from the generator -- and it produces
a file that is exactly the size of the original message in length.

There are encryption schemes that expend extraordinary amounts of
computational energy in generating the stream, and there are also block
ciphers (which are indeed hard to implement for a streaming tape full of
data, as they usually don't work so well for long messages).  But in the
end no, it isn't that hard to encrypt a backup tape, provided that you
are willing to accept the limitation that the speed of
encrypting/decrypting the stream being written to the tape is basically
limited by the speed of your RNG (which may well be slower than the
speed of most fast networks).

    rgb

>
> --Chi
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Tue Oct  4 16:43:15 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Tue, 4 Oct 2011 13:43:15 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>
References: <mailman.1.1317754801.13041.beowulf@beowulf.org>
	<d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F010851503486@ALTPHYEMBEVSP20.RES.AD.JPL>

> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of mathog
> Sent: Tuesday, October 04, 2011 1:07 PM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud
> 
> > "Robert G. Brown" <rgb at phy.duke.edu> wrote:
> 
> > Often, but not always, housing, cooling, powering, and even managing
> > the
> > hardware is "free" to the researcher, absorbed into the ongoing costs
> > of
> > the server room and management staff already needed to run the
> > department LAN and servers.
> 
> Not always indeed.  My little machine room houses a half dozen machines
> from other biology division
> people, and they are not charged to keep them there.  However, putting
> a computer in the central
> campus machine rooms is not free.  And new computer rooms, at least
> those of any size, do not
> get free power.  After geology put in this monster:
> 
>    http://www.gps.caltech.edu/uploads/Image/Facilities/Beowulf.jpg
> 
http://citerra.gps.caltech.edu/wiki/Public/Technology
A mere 512 nodes, each with 8 cores.

670W power supply is standard, so let's say about 500 nodes at 700 watts each or 350kW...

HVAC will add on top of that, but I doubt they're loaded to the max.

Call it 400kW.. That's big, but not enormous.  (e.g you can rent a trailer mounted generator for that kind of power for about $1000/day.. the bigger generators one sees on a movie set might be 200-300kW))  CalTrans will only pay $123/hr for a 500kW generator (and fuel cost comes out of that)


But, if you were paying SoCalEdison for the juice..You'd be on (minimum) the TOU-GS-3 tariff.. On peak you'd be paying 0.02/kWh for delivery and 0.104/kWh for the power.  (off peak would be 0.045/kWh)

So call it 12c/kWh on peak.  At 400kW, that's $48/hr, which isn't bad, operating expenses wise.

Let's compare to the EC2.. $1300/hr for 30k cores. 23 core hours/$
            The CITerra is $50/hr for 4000 cores.  80 core hours/$

Yes, one had to go out and BUY all those cores for CITerra. $5000/node, all in, including cabling racks, etc.? What's that, about $1.25M.  Spread that out over 3 years at 2000 hrs/year (we only consider working in the daytime, etc. and you get about $210/hr for the capital cost (for all 500+ nodes..)


So, the EC2 seems like a good solution when you need rapid scalability to huge sizes and you have a big expense budget and a small capital budget.   You could call up Amazon this afternoon and run that 30,000 core job tonight.  And you'd pay substantially for that flexibility (which is how Amazon makes money, eh?)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jlb17 at duke.edu  Tue Oct  4 16:47:30 2011
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 4 Oct 2011 16:47:30 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041611120.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
Message-ID: <alpine.LRH.2.02.1110041644490.2124@hogwarts.bme.duke.edu>

On Tue, 4 Oct 2011 at 4:39pm, Robert G. Brown wrote

> On Tue, 4 Oct 2011, Chi Chan wrote:
>
>> On Tue, Oct 4, 2011 at 11:58 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
>>> BTW, I've heard horror stories related to routing errors with this
>>> method - truck drivers delivering wrong tapes or losing tapes
>>> (hopefully the data is properly encrypted).
>>
>> I just read this on Slashdot today, it is "very hard to encrypt a
>> backup tape" (really?):
>>
>> http://yro.slashdot.org/story/11/10/04/1815256/saic-loses-data-of-49-million-patients
>
> Not if it is encrypted with a stream cipher -- a stream cipher basically
> xors the data with a bitstream generated from a suitable key in a
> cryptographic-strength pseudorandom number generator (although there are
> variations on this theme).  As a result, it can be quite fast -- as fast
> as generating pseudorandom numbers from the generator -- and it produces
> a file that is exactly the size of the original message in length.

For added "no, it's not hard, they're apparently just not very bright" 
value, LTO4+ includes hardware AES encryption.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Tue Oct  4 16:48:00 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Tue, 4 Oct 2011 13:48:00 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041611120.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>


> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> Sent: Tuesday, October 04, 2011 1:39 PM
> To: Chi Chan
> Cc: Rayson Ho; Lux, Jim (337C); tt at postbiota.org; jtriley at mit.edu; Beowulf List
> Subject: Re: [Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud
> 
> On Tue, 4 Oct 2011, Chi Chan wrote:
> 
> > On Tue, Oct 4, 2011 at 11:58 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
> >> BTW, I've heard horror stories related to routing errors with this
> >> method - truck drivers delivering wrong tapes or losing tapes
> >> (hopefully the data is properly encrypted).
> >
> > I just read this on Slashdot today, it is "very hard to encrypt a
> > backup tape" (really?):
> >
> > http://yro.slashdot.org/story/11/10/04/1815256/saic-loses-data-of-49-million-patients
> 
> Not if it is encrypted with a stream cipher -- a stream cipher basically
> xors the data with a bitstream generated from a suitable key in a
> cryptographic-strength pseudorandom number generator (although there are
> variations on this theme).  As a result, it can be quite fast -- as fast
> as generating pseudorandom numbers from the generator -- and it produces
> a file that is exactly the size of the original message in length.
> 
> There are encryption schemes that expend extraordinary amounts of
> computational energy in generating the stream, and there are also block
> ciphers (which are indeed hard to implement for a streaming tape full of
> data, as they usually don't work so well for long messages).  But in the
> end no, it isn't that hard to encrypt a backup tape, provided that you
> are willing to accept the limitation that the speed of
> encrypting/decrypting the stream being written to the tape is basically
> limited by the speed of your RNG (which may well be slower than the
> speed of most fast networks).
> 

The reason it wasn't encrypted is almost certainly not because it was difficult to do so for technology reasons. When you see a story about "data being lost or stolen from a car" it's because it was an ad hoc situation. Someone got a copy of the data to do some sort of analysis or to take it somewhere on a onetime basis, and "things went wrong".

Any sort of regular process would normally deal with encryption or security as a matter of course: it's too easy to do it right.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Tue Oct  4 16:52:13 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Tue, 4 Oct 2011 13:52:13 -0700
Subject: [Beowulf] $1, 279-per-hour, 30,
	000-core cluster built on	Amazon EC2 cloud
In-Reply-To: <4E8B5E98.3090002@sonsorol.org>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<alpine.LFD.2.02.1110041414180.14257@lilith>
	<4E8B5E98.3090002@sonsorol.org>
Message-ID: <20111004205213.GD14057@bx9.net>

On Tue, Oct 04, 2011 at 03:29:28PM -0400, Chris Dagdigian wrote:

> I'm largely with RGB on this one with the minor caveat that I think he 
> might be undervaluing the insane economies of scale that IaaS providers 
> like Amazon & Google can provide.

You can rent that economy of scale if you're in the right part of the
country. We weren't surprised to recently learn that our Silicon
Valley datacenter rent is much lower than Moscow, but I was surprised
to learn that we pay 1/3 less here than in Vegas, which allegedly has
cheap land and power hence cheap datacenter rents. And with only 750
servers, we are already big enough to reap enough outright economy of
scale to make leasing our own servers in a rented datacenter cheaper
than renting everything from Amazon.

The unique thing Amazon is providing is the ability to grow and shrink
your cluster. Your example of a company which wanted to run a bunch of
molecular dynamics computations in a short period of time is an
illustration of that.

BTW, Amazon has lowered prices since AWS was released, but not by as
much as their costs have fallen. That's no surprise, given their
dominant role in that market.

-- greg
(corporate hat: infrastructure at a search engine)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Tue Oct  4 17:03:46 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 4 Oct 2011 17:03:46 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1110041649390.14257@lilith>

On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:

> 
> The reason it wasn't encrypted is almost certainly not because it
> was difficult to do so for technology reasons. When you see a story
> about "data being lost or stolen from a car" it's because it was an ad
> hoc situation. Someone got a copy of the data to do some sort of
> analysis or to take it somewhere on a onetime basis, and "things went
> wrong".
>
> Any sort of regular process would normally deal with encryption or
> security as a matter of course: it's too easy to do it right.

The problem being that HIPAA is not amused by incompetence.  The
standard is pretty much show due diligence or be prepared to pay massive
bucks out in lawsuits should the data you protect be compromised.  It is
really a most annoying standard -- I mean it is good that it is so
flexible and makes the responsibility clear, but for most of HIPAA's
existence it has provided no F***ing guidelines on how to make protected
data secure.

Consequently (and I say this as a modest consultant-level expert) your
data and mine in the Electronic Medical Record of your choice is
typically:

   a) Stored in flat, unencrypted plaintext or binary image in the base
DB.

   b) Transmitted in flat, unencrypted plaintext between the server and
any LAN-connected clients.  In other words, it assumes that your local
LAN is secure.

   c) Relies on third party e.g. VPN solutions to provide encryption for
use across a WAN.

Needless to say, the passwords and authentication schemes used in EMRs
are typically a joke -- after all, the users are borderline incompetent
users and cannot be expected to remember or quickly type in a user id or
password much more complicated than their own initials.  Many sites have
one completely trivial password in use by all the physicians and nurses
who use the system -- just enough to MAYBE keep patients out of the
system while waiting in an examining room.

I have had to convince the staff of at least one major EMR company that
I will refrain from naming that no, I wasn't going to ship them a copy
of an entire dataset exported from an old practice management system --
think of it as the names, addresses, SSNs and a few dozen other
"protected" pieces of personal information -- to them as an unencrypted
zip file over the internet, and had to finally grit my teeth and accept
the use of zip's (not terribly good) built in encryption and cross my
fingers and pray.

Do not underestimate the sheer power of incompetence, in other words,
especially incompetence in an environment almost completely lacking
meaningful IT-level standards or oversight.  It's really shameful,
actually -- it would be so very easy to build in nearly bulletproof
security schema that would make the need for third party VPNs passe.

I don't know that ALL of the EMRs out there are STILL this bad, but I'd
bet that 90% of them are.  They certainly were 3-4 years ago, last time
I looked in detail.

So this is just par for the course.  Doctors don't understand IT
security.  EMR creators should, but security is "expensive" and they
don't bother because it isn't mandated.  The end result is that
everything from the DB to the physician's working screen is so horribly
insecure that if any greed-driven cracker out there ever decided to
exclusively target the weaknesses, they could compromise HIPAA and SSNs
by the millions.

Sigh.

    rgb

> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Tue Oct  4 17:21:31 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Tue, 4 Oct 2011 17:21:31 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041649390.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
Message-ID: <44854.192.168.93.213.1317763291.squirrel@mail.eadline.org>


Several years ago I flippantly proposed what seems to be
a simple way to ensure important consumer private data
(medical, finance, etc.) was safe. Pass a law that says
organization who collects or holds personal data must
include the same data for organization's Board of Directors and
officers (CEO, COO etc) in the database. At least
the CEO might start taking security serious when
someone in Bulgaria is buying jet skies with his AMX card.

--
Doug


> On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:
>
>>
>> The reason it wasn't encrypted is almost certainly not because it
>> was difficult to do so for technology reasons. When you see a story
>> about "data being lost or stolen from a car" it's because it was an ad
>> hoc situation. Someone got a copy of the data to do some sort of
>> analysis or to take it somewhere on a onetime basis, and "things went
>> wrong".
>>
>> Any sort of regular process would normally deal with encryption or
>> security as a matter of course: it's too easy to do it right.
>
> The problem being that HIPAA is not amused by incompetence.  The
> standard is pretty much show due diligence or be prepared to pay massive
> bucks out in lawsuits should the data you protect be compromised.  It is
> really a most annoying standard -- I mean it is good that it is so
> flexible and makes the responsibility clear, but for most of HIPAA's
> existence it has provided no F***ing guidelines on how to make protected
> data secure.
>
> Consequently (and I say this as a modest consultant-level expert) your
> data and mine in the Electronic Medical Record of your choice is
> typically:
>
>    a) Stored in flat, unencrypted plaintext or binary image in the base
> DB.
>
>    b) Transmitted in flat, unencrypted plaintext between the server and
> any LAN-connected clients.  In other words, it assumes that your local
> LAN is secure.
>
>    c) Relies on third party e.g. VPN solutions to provide encryption for
> use across a WAN.
>
> Needless to say, the passwords and authentication schemes used in EMRs
> are typically a joke -- after all, the users are borderline incompetent
> users and cannot be expected to remember or quickly type in a user id or
> password much more complicated than their own initials.  Many sites have
> one completely trivial password in use by all the physicians and nurses
> who use the system -- just enough to MAYBE keep patients out of the
> system while waiting in an examining room.
>
> I have had to convince the staff of at least one major EMR company that
> I will refrain from naming that no, I wasn't going to ship them a copy
> of an entire dataset exported from an old practice management system --
> think of it as the names, addresses, SSNs and a few dozen other
> "protected" pieces of personal information -- to them as an unencrypted
> zip file over the internet, and had to finally grit my teeth and accept
> the use of zip's (not terribly good) built in encryption and cross my
> fingers and pray.
>
> Do not underestimate the sheer power of incompetence, in other words,
> especially incompetence in an environment almost completely lacking
> meaningful IT-level standards or oversight.  It's really shameful,
> actually -- it would be so very easy to build in nearly bulletproof
> security schema that would make the need for third party VPNs passe.
>
> I don't know that ALL of the EMRs out there are STILL this bad, but I'd
> bet that 90% of them are.  They certainly were 3-4 years ago, last time
> I looked in detail.
>
> So this is just par for the course.  Doctors don't understand IT
> security.  EMR creators should, but security is "expensive" and they
> don't bother because it isn't mandated.  The end result is that
> everything from the DB to the physician's working screen is so horribly
> insecure that if any greed-driven cracker out there ever decided to
> exclusively target the weaknesses, they could compromise HIPAA and SSNs
> by the millions.
>
> Sigh.
>
>     rgb
>
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at caltech.edu  Tue Oct  4 17:39:40 2011
From: mathog at caltech.edu (mathog)
Date: Tue, 04 Oct 2011 14:39:40 -0700
Subject: [Beowulf]
 =?utf-8?q?=241=2C_279-per-hour=2C_30=2C=09000-core_clus?=
 =?utf-8?q?ter_built_on_Amazon_EC2_cloud?=
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F010851503486@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <mailman.1.1317754801.13041.beowulf@beowulf.org>
	<d96c3238d8837358c9e90fe852a7f045@saf.bio.caltech.edu>
	<ECE7A93BD093E1439C20020FBE87C47F010851503486@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <1a4e05cecd44d8777737e6994d09b289@saf.bio.caltech.edu>

On Tue, 4 Oct 2011 13:43:15 -0700, Lux, Jim (337C) wrote:

> So call it 12c/kWh on peak.  At 400kW, that's $48/hr, which isn't
> bad, operating expenses wise.

Well, yes and no.  If they only turned it on once and a while it 
wouldn't
be too bad, but I'm pretty sure it runs 100% of the time.  At least I 
have never
walked by when the racks were not lit up, so...

   $48 * 24 * 365 = $420480/year

Versus the average lab at (waves hands) $150 in electricity a month = 
$1800/year?
It will of course depend on what kind of work the lab does.  The 
difference is two orders of
magnitude.  Anyway, last I looked we had around 300 professors, so that 
one facility
used up, order of magnitude, as much juice as all the "normal" labs 
combined.  (Certainly there
are some other labs around which also use a lot of electricity.)

Cooling water usage was probably also a sore point from the 
administration's perspective.
Pretty much everything here runs AC off chilled water coming from a 
central plant.  Either
that cluster used up a whole lot of chilled water capacity at the 
central plant or they built a
a separate chiller somewhere. Dave Kewley who sometimes posts here used 
to run
that system, so he would know.

Regards

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jlb17 at duke.edu  Tue Oct  4 17:41:02 2011
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 4 Oct 2011 17:41:02 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LFD.2.02.1110041649390.14257@lilith>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
Message-ID: <alpine.LRH.2.02.1110041732580.18414@hogwarts.bme.duke.edu>

On Tue, 4 Oct 2011 at 5:03pm, Robert G. Brown wrote

> Needless to say, the passwords and authentication schemes used in EMRs
> are typically a joke -- after all, the users are borderline incompetent
> users and cannot be expected to remember or quickly type in a user id or
> password much more complicated than their own initials.  Many sites have
> one completely trivial password in use by all the physicians and nurses
> who use the system -- just enough to MAYBE keep patients out of the
> system while waiting in an examining room.

My wife's experience here was somewhat the opposite of that.  Within 2 
days of starting her fellowship at UCSF she had acquired over 10 usernames 
and passwords (and one RSA hardware token) for all the various systems she 
needed to interact with.  Each system, of course, had its own password 
aging and renewal rules.  Determining how physicians manage their 
passwords in such an environment is left as an exercise for the reader...

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Oct  5 08:40:53 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 5 Oct 2011 08:40:53 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <44854.192.168.93.213.1317763291.squirrel@mail.eadline.org>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
	<44854.192.168.93.213.1317763291.squirrel@mail.eadline.org>
Message-ID: <alpine.LFD.2.02.1110050822290.14257@lilith>

On Tue, 4 Oct 2011, Douglas Eadline wrote:

>
> Several years ago I flippantly proposed what seems to be
> a simple way to ensure important consumer private data
> (medical, finance, etc.) was safe. Pass a law that says
> organization who collects or holds personal data must
> include the same data for organization's Board of Directors and
> officers (CEO, COO etc) in the database. At least
> the CEO might start taking security serious when
> someone in Bulgaria is buying jet skies with his AMX card.

It wouldn't help.  Physicians are too clueless to understand or care
(mostly, not universally) and besides, what can they do?  They don't
write software.  The companies that provide the software won't have
their board's information in the DB under any circumstances, and they
are the problem.  Or rather, the unregulated nature of the business is
the problem.  The government is spending all sorts of energy specifying
the detailed structure of the DB and ICD codes for every possible
illness at a staggering degree of granularity so that they can
eventually micro-specify compensation rates for fingering your left
gonad during an exam but are leaving HIPAA -- a disaster from day one in
so very many ways -- in place as the sole guardian of our medical
privacy.

HIPAA fails to specify IT security, and obscures precisely who will be
held financially responsible for failures of security or what other
sanctions might be applied.  HIPAA has had the easily predictable side
effect of placing enormous physical and financial obstacles in the path
of medical research, to the point where I think it is safe to say that
HIPAA alone has de fact killed thousands to tens of thousands of people
simply by delaying discovery for years to decades (while costing us a
modest fortune to perform such research as is now performed, with whole
departments in any research setting devoted to managing the
permissioning of the data).  Finally, HIPAA's fundamental original
purpose was to keep e.g. health insurance companies or employers from
getting your health care records and using them to deny coverage or
employment, and it didn't really succeed even in that because of the
appalling state of deregulation in the insurance industry itself.

It's really pretty amazing.  It's hard to imagine how anyone could have
come up with a piece of governance so diabolically well designed to be
enormously expensive in money and lives while failing even to accomplish
its own primary goals or the related goals that it SHOULD have tried to
accomplish (such as mandating a certain -- high -- level of security and
complete open-standard interoperability and data portability in emergent
EMR/PM systems, at least at the DB level), even if they tried.  However,
we should never be hasty to ascribe to human evil that which can
adequately be explained by mere incompetence and stupidity.

But this is OT, and I'll return to my muttons now.  Soap box out.

    rgb

>
> --
> Doug
>
>
>
>
>
>> On Tue, 4 Oct 2011, Lux, Jim (337C) wrote:
>>
>>>
>>> The reason it wasn't encrypted is almost certainly not because it
>>> was difficult to do so for technology reasons. When you see a story
>>> about "data being lost or stolen from a car" it's because it was an ad
>>> hoc situation. Someone got a copy of the data to do some sort of
>>> analysis or to take it somewhere on a onetime basis, and "things went
>>> wrong".
>>>
>>> Any sort of regular process would normally deal with encryption or
>>> security as a matter of course: it's too easy to do it right.
>>
>> The problem being that HIPAA is not amused by incompetence.  The
>> standard is pretty much show due diligence or be prepared to pay massive
>> bucks out in lawsuits should the data you protect be compromised.  It is
>> really a most annoying standard -- I mean it is good that it is so
>> flexible and makes the responsibility clear, but for most of HIPAA's
>> existence it has provided no F***ing guidelines on how to make protected
>> data secure.
>>
>> Consequently (and I say this as a modest consultant-level expert) your
>> data and mine in the Electronic Medical Record of your choice is
>> typically:
>>
>>    a) Stored in flat, unencrypted plaintext or binary image in the base
>> DB.
>>
>>    b) Transmitted in flat, unencrypted plaintext between the server and
>> any LAN-connected clients.  In other words, it assumes that your local
>> LAN is secure.
>>
>>    c) Relies on third party e.g. VPN solutions to provide encryption for
>> use across a WAN.
>>
>> Needless to say, the passwords and authentication schemes used in EMRs
>> are typically a joke -- after all, the users are borderline incompetent
>> users and cannot be expected to remember or quickly type in a user id or
>> password much more complicated than their own initials.  Many sites have
>> one completely trivial password in use by all the physicians and nurses
>> who use the system -- just enough to MAYBE keep patients out of the
>> system while waiting in an examining room.
>>
>> I have had to convince the staff of at least one major EMR company that
>> I will refrain from naming that no, I wasn't going to ship them a copy
>> of an entire dataset exported from an old practice management system --
>> think of it as the names, addresses, SSNs and a few dozen other
>> "protected" pieces of personal information -- to them as an unencrypted
>> zip file over the internet, and had to finally grit my teeth and accept
>> the use of zip's (not terribly good) built in encryption and cross my
>> fingers and pray.
>>
>> Do not underestimate the sheer power of incompetence, in other words,
>> especially incompetence in an environment almost completely lacking
>> meaningful IT-level standards or oversight.  It's really shameful,
>> actually -- it would be so very easy to build in nearly bulletproof
>> security schema that would make the need for third party VPNs passe.
>>
>> I don't know that ALL of the EMRs out there are STILL this bad, but I'd
>> bet that 90% of them are.  They certainly were 3-4 years ago, last time
>> I looked in detail.
>>
>> So this is just par for the course.  Doctors don't understand IT
>> security.  EMR creators should, but security is "expensive" and they
>> don't bother because it isn't mandated.  The end result is that
>> everything from the DB to the physician's working screen is so horribly
>> insecure that if any greed-driven cracker out there ever decided to
>> exclusively target the weaknesses, they could compromise HIPAA and SSNs
>> by the millions.
>>
>> Sigh.
>>
>>     rgb
>>
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>> Duke University Dept. of Physics, Box 90305
>> Durham, N.C. 27708-0305
>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>
>
> --
> Doug
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Oct  5 08:45:02 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 5 Oct 2011 08:45:02 -0400 (EDT)
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on Amazon EC2 cloud
In-Reply-To: <alpine.LRH.2.02.1110041732580.18414@hogwarts.bme.duke.edu>
References: <CAHwLALO3h2aLPxE_Xh6zThVJ5fcYi9G29e88W0i01KE8jgWfPQ@mail.gmail.com>
	<CAB0708B.FE76%james.p.lux@jpl.nasa.gov>
	<CAHwLALP2x9LLiNTy=i3Hznrvj0=y83AtpHDtS4RTsqwzd7C34A@mail.gmail.com>
	<CAKssKOh=HFn-z7rscR+Qin3JXaCyW9qUhTrrqMFguMfWUdptUQ@mail.gmail.com>
	<alpine.LFD.2.02.1110041611120.14257@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010851503489@ALTPHYEMBEVSP20.RES.AD.JPL>
	<alpine.LFD.2.02.1110041649390.14257@lilith>
	<alpine.LRH.2.02.1110041732580.18414@hogwarts.bme.duke.edu>
Message-ID: <alpine.LFD.2.02.1110050842460.14257@lilith>

On Tue, 4 Oct 2011, Joshua Baker-LePain wrote:

> On Tue, 4 Oct 2011 at 5:03pm, Robert G. Brown wrote
>
>> Needless to say, the passwords and authentication schemes used in EMRs
>> are typically a joke -- after all, the users are borderline incompetent
>> users and cannot be expected to remember or quickly type in a user id or
>> password much more complicated than their own initials.  Many sites have
>> one completely trivial password in use by all the physicians and nurses
>> who use the system -- just enough to MAYBE keep patients out of the
>> system while waiting in an examining room.
>
> My wife's experience here was somewhat the opposite of that.  Within 2
> days of starting her fellowship at UCSF she had acquired over 10 usernames
> and passwords (and one RSA hardware token) for all the various systems she
> needed to interact with.  Each system, of course, had its own password
> aging and renewal rules.  Determining how physicians manage their
> passwords in such an environment is left as an exercise for the reader...

Ah, yes, excellent.  Ten of them AND an RSA e.g. SecureID -- wow, that
takes some real brilliance.  I know how MY physician wife would manage
it...

    rgb

>
> -- 
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Wed Oct  5 09:42:28 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Wed, 05 Oct 2011 09:42:28 -0400
Subject: [Beowulf] $1, 279-per-hour, 30,
 000-core cluster built on	Amazon EC2 cloud
In-Reply-To: <20111004205213.GD14057@bx9.net>
References: <CAB0708B.FE76%james.p.lux@jpl.nasa.gov>	<alpine.LFD.2.02.1110041414180.14257@lilith>	<4E8B5E98.3090002@sonsorol.org>
	<20111004205213.GD14057@bx9.net>
Message-ID: <4E8C5EC4.9020101@runnersroll.com>

On 10/04/11 16:52, Greg Lindahl wrote:
> On Tue, Oct 04, 2011 at 03:29:28PM -0400, Chris Dagdigian wrote:
>> I'm largely with RGB on this one with the minor caveat that I think he 
>> might be undervaluing the insane economies of scale that IaaS providers 
>> like Amazon & Google can provide.
> 
> cheap land and power hence cheap datacenter rents. And with only 750
> servers, we are already big enough to reap enough outright economy of
> scale to make leasing our own servers in a rented datacenter cheaper
> than renting everything from Amazon.
> 
> The unique thing Amazon is providing is the ability to grow and shrink
> your cluster. Your example of a company which wanted to run a bunch of
> molecular dynamics computations in a short period of time is an
> illustration of that.

On this note, does anyone know if there are prior works (either academic
or publicly disclosed documentations of a company pursuing such a route)
of people splitting their workload up into the "static" and "dynamic"
portions and running them respectively on in-house and rented hardware?
 While I see this discussion time and time again go either one way or
the other (google or amazon, if you will), I suspect for many companies
if it were possible to "invisibly" extend their infrastructure into the
cloud on an as-needed basis, it might be a pretty attractive solution.

Put another way, there doesn't seem to be much sense in buying a couple
more racks for just a short-term project that will result in those racks
going silent afterwards.  On the flipside, you probably have some
fraction of the compute and data resources you need as it is, you just
want it to run a little faster or need a little more scratch
space/bandwidth.  So renting an entire set of resources wouldn't be
optimal either, since that will result in underutilization of the
infrastructure at home.  So just buy whatever fraction your missing from
Amazon from a month and use some hacks to make it look like that
hardware is right there next to your other stuff.  Obviously this
requires an embarrassingly parallel workload due to the locality
dichotomy (or completely disjoint workloads).

Another idea I had was just like solar energy, what if there was a way
for you to build up credits for Amazon in the "day" and use them at
"night"?  I.E. put some Amazon software on your infrastructure that
allows you them to use your servers as part of their "cloud" when you're
not using your equipment at max, and when you do go peak it will
automatically provision more and more Amazon leased resources on an
as-needed basis and burn up those earned credits instead of "real money."

Just some ideas I figured I'd put through the beo-blender to see if they
hold any weight before actually pursuing them as research objectives.

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From jcownie at cantab.net  Thu Oct  6 13:33:51 2011
From: jcownie at cantab.net (James Cownie)
Date: Thu, 6 Oct 2011 18:33:51 +0100
Subject: [Beowulf] Beowulf Bash at SC11?
Message-ID: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>

SC approaches fast, but I've seen no mention of a Beowulf Bash.

Has it died?

Did I just miss an announcement?

--
-- Jim
--
James Cownie <jcownie at cantab.net>


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111006/647304c1/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From prentice at ias.edu  Fri Oct  7 09:45:29 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 07 Oct 2011 09:45:29 -0400
Subject: [Beowulf] Beowulf Bash at SC11?
In-Reply-To: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
References: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
Message-ID: <4E8F0279.5070809@ias.edu>

There's an announcement on beowulf.org for a Beowulf Bash... from 2009!

Beowulf Bash: The 11th Annual Beowulf.org Meeting
November 16, 2009
Portland OR
Location: The Game, One Center Court, The Rose Quarter Sponsors:
AMD Cluster Monkey
InsideHPC
Penguin Computing
SiCorp TeraScala
XAND Marketing


On 10/06/2011 01:33 PM, James Cownie wrote:
> SC approaches fast, but I've seen no mention of a Beowulf Bash.
> 
> Has it died?
> 
> Did I just miss an announcement?
> 
> --
> 
> -- Jim
> 
> --
> 
> James Cownie <jcownie at cantab.net <mailto:jcownie at cantab.net>>
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Glen.Beane at jax.org  Fri Oct  7 10:21:41 2011
From: Glen.Beane at jax.org (Glen Beane)
Date: Fri, 7 Oct 2011 14:21:41 +0000
Subject: [Beowulf] Beowulf Bash at SC11?
In-Reply-To: <4E8F0279.5070809@ias.edu>
References: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
	<4E8F0279.5070809@ias.edu>
Message-ID: <7514EA83-EDED-453C-8901-1C861D36C1B2@jax.org>

I remember not hearing much about it last year in New Orleans until someone I knew from Penguin handed me a card Monday night at the opening gala


On Oct 7, 2011, at 9:45 AM, Prentice Bisbal wrote:

> There's an announcement on beowulf.org for a Beowulf Bash... from 2009!
> 
> Beowulf Bash: The 11th Annual Beowulf.org Meeting
> November 16, 2009
> Portland OR
> Location: The Game, One Center Court, The Rose Quarter Sponsors:
> AMD Cluster Monkey
> InsideHPC
> Penguin Computing
> SiCorp TeraScala
> XAND Marketing
> 
> 
> On 10/06/2011 01:33 PM, James Cownie wrote:
>> SC approaches fast, but I've seen no mention of a Beowulf Bash.
>> 
>> Has it died?
>> 
>> Did I just miss an announcement?
>> 
>> --
>> 
>> -- Jim
>> 
>> --
>> 
>> James Cownie <jcownie at cantab.net <mailto:jcownie at cantab.net>>
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Fri Oct  7 17:19:52 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Fri, 7 Oct 2011 17:19:52 -0400 (EDT)
Subject: [Beowulf] Beowulf Bash at SC11?
In-Reply-To: <7514EA83-EDED-453C-8901-1C861D36C1B2@jax.org>
References: <9DAC8FB2-067E-4F1B-ABBA-1AF995E62A33@cantab.net>
	<4E8F0279.5070809@ias.edu>
	<7514EA83-EDED-453C-8901-1C861D36C1B2@jax.org>
Message-ID: <47582.192.168.93.213.1318022392.squirrel@mail.eadline.org>

I always announce it on this list and on ClusterMonkey, it also
will be announced on InsideHPC and some of the sponsor sites.

--
Doug


> I remember not hearing much about it last year in New Orleans until
> someone I knew from Penguin handed me a card Monday night at the opening
> gala
>
>
> On Oct 7, 2011, at 9:45 AM, Prentice Bisbal wrote:
>
>> There's an announcement on beowulf.org for a Beowulf Bash... from 2009!
>>
>> Beowulf Bash: The 11th Annual Beowulf.org Meeting
>> November 16, 2009
>> Portland OR
>> Location: The Game, One Center Court, The Rose Quarter Sponsors:
>> AMD Cluster Monkey
>> InsideHPC
>> Penguin Computing
>> SiCorp TeraScala
>> XAND Marketing
>>
>>
>> On 10/06/2011 01:33 PM, James Cownie wrote:
>>> SC approaches fast, but I've seen no mention of a Beowulf Bash.
>>>
>>> Has it died?
>>>
>>> Did I just miss an announcement?
>>>
>>> --
>>>
>>> -- Jim
>>>
>>> --
>>>
>>> James Cownie <jcownie at cantab.net <mailto:jcownie at cantab.net>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Glen L. Beane
> Senior Software Engineer
> The Jackson Laboratory
> (207) 288-6153
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kilian.cavalotti.work at gmail.com  Tue Oct 11 11:21:32 2011
From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI)
Date: Tue, 11 Oct 2011 17:21:32 +0200
Subject: [Beowulf] IBM to acquire Platform Computing
Message-ID: <CAJz=VjE2qi5a5QNAPff2q=bs2qXZMxSS3suuYFUM4Z2mn+u9=g@mail.gmail.com>

http://www.platform.com/press-releases/2011/IBMtoAcquireSystemSoftwareCompanyPlatformComputingtoExtendReachofTechnicalComputing
and
http://www-03.ibm.com/systems/deepcomputing/platform.html

Cheers,
-- 
Kilian
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From dag at sonsorol.org  Wed Oct 12 10:52:13 2011
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 12 Oct 2011 10:52:13 -0400
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
Message-ID: <4E95A99D.9040703@sonsorol.org>


First time I'm seriously pondering bringing 10GbE straight to compute 
nodes ...

For 64 servers (32 to a cabinet) and an HPC system that spans two racks 
what would be the common 10 Gig networking topology be today?

- One large core switch?
- 48 port top-of-rack switches with trunking?
- Something else?

Regards,
Chris


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Wed Oct 12 10:58:58 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 12 Oct 2011 10:58:58 -0400
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <4E95AB32.3030804@scalableinformatics.com>

On 10/12/2011 10:52 AM, Chris Dagdigian wrote:
>
> First time I'm seriously pondering bringing 10GbE straight to compute
> nodes ...
>
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks
> what would be the common 10 Gig networking topology be today?
>
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?

What's the use case?  Low latency, or simplified high bandwidth connection?

10GbE with 40GbE uplinks won't be cheap.  But it would be doable. 
Gnodal, Mellanox, and others would be able to do this.

>
> Regards,
> Chris
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From i.n.kozin at googlemail.com  Wed Oct 12 11:22:52 2011
From: i.n.kozin at googlemail.com (Igor Kozin)
Date: Wed, 12 Oct 2011 16:22:52 +0100
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <CABW111H2fT+SvYA+LJMriVr=OotPAM-u3mPMm+nV6-Y8Y2eBXg@mail.gmail.com>

Gnodal was probably the first to announce a 1U 72 port switch
http://www.gnodal.com/docs/Gnodal%20GS7200%20datasheet.pdf
Other vendors either have announced or will be probably announcing
dense packaging too.

On 12 October 2011 15:52, Chris Dagdigian <dag at sonsorol.org> wrote:
>
> First time I'm seriously pondering bringing 10GbE straight to compute
> nodes ...
>
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks
> what would be the common 10 Gig networking topology be today?
>
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?
>
> Regards,
> Chris
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From john.hearns at mclaren.com  Wed Oct 12 11:28:28 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Wed, 12 Oct 2011 16:28:28 +0100
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <207BB2F60743C34496BE41039233A80903FB49D5@MRL-PWEXCHMB02.mil.tagmclarengroup.com>


First time I'm seriously pondering bringing 10GbE straight to compute 
nodes ...

For 64 servers (32 to a cabinet) and an HPC system that spans two racks 
what would be the common 10 Gig networking topology be today?

- One large core switch?
- 48 port top-of-rack switches with trunking?
- Something else?


I was going to suggest two Gnodal rack top switches, linked by a 40Gbps link

http://www.gnodal.com/

I see though that their GS7200 switch has 72 x 10Gbps ports - should do you just fine!

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From akshar.bhosale at gmail.com  Wed Oct 12 12:28:57 2011
From: akshar.bhosale at gmail.com (akshar bhosale)
Date: Wed, 12 Oct 2011 21:58:57 +0530
Subject: [Beowulf] refunding reserved amount in gold
Message-ID: <CANaMn15mu5j6wgeovHihP1amULCdk3TSaDRze7Kw3RHn8Q35uA@mail.gmail.com>

Hi,

       We are using PBS (torque 2.4.8) and gold version 2.1.7.1. One of the
jobs went for execution and reserved the equivalent amount. The same job
came out of execution and went in queue from execution. This happened 30
times for the same job. Every time job has reserved amount. Now finally
there is very huge amount(30*charges for that single job) which is shown in
reserved state.Job now does not exist. User can not submit the new job now
because of neglegible amount balance in his account. We want to clear
reserved amount. How to do that?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111012/0ca12c5e/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From Shainer at Mellanox.com  Wed Oct 12 12:30:02 2011
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Wed, 12 Oct 2011 16:30:02 +0000
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <207BB2F60743C34496BE41039233A80903FB49D5@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <4E95A99D.9040703@sonsorol.org>
	<207BB2F60743C34496BE41039233A80903FB49D5@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <F46B2E61C40ADF4ABD39500BC54C3C7918865EDC@MTIDAG01.mtl.com>

You can also check the Mellanox products - both for 40GigE and 10GigE switch fabric. 

Gilad


-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Hearns, John
Sent: Wednesday, October 12, 2011 8:31 AM
To: dag at sonsorol.org; beowulf at beowulf.org
Subject: Re: [Beowulf] 10GbE topologies for small-ish clusters?


First time I'm seriously pondering bringing 10GbE straight to compute nodes ...

For 64 servers (32 to a cabinet) and an HPC system that spans two racks what would be the common 10 Gig networking topology be today?

- One large core switch?
- 48 port top-of-rack switches with trunking?
- Something else?


I was going to suggest two Gnodal rack top switches, linked by a 40Gbps link

http://www.gnodal.com/

I see though that their GS7200 switch has 72 x 10Gbps ports - should do you just fine!

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at ur.rochester.edu  Wed Oct 12 12:33:39 2011
From: scrusan at ur.rochester.edu (Steve Crusan)
Date: Wed, 12 Oct 2011 12:33:39 -0400
Subject: [Beowulf] refunding reserved amount in gold
In-Reply-To: <CANaMn15mu5j6wgeovHihP1amULCdk3TSaDRze7Kw3RHn8Q35uA@mail.gmail.com>
References: <CANaMn15mu5j6wgeovHihP1amULCdk3TSaDRze7Kw3RHn8Q35uA@mail.gmail.com>
Message-ID: <85631CC6-BFE0-44A2-B69E-42BB660AC632@ur.rochester.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I would suggest you post this to the Gold mailing list with a few more pieces of information:

http://www.supercluster.org/mailman/listinfo/gold-users

Regardless, you could probably use the grefund command...


On Oct 12, 2011, at 12:28 PM, akshar bhosale wrote:

> Hi,
> 
>       We are using PBS (torque 2.4.8) and gold version 2.1.7.1. One of the
> jobs went for execution and reserved the equivalent amount. The same job
> came out of execution and went in queue from execution. This happened 30
> times for the same job. Every time job has reserved amount. Now finally
> there is very huge amount(30*charges for that single job) which is shown in
> reserved state.Job now does not exist. User can not submit the new job now
> because of neglegible amount balance in his account. We want to clear
> reserved amount. How to do that?
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOlcFoAAoJENS19LGOpgqK1UIIAIFZj6fIZebQt9xQwmVBVxB9
MPwJMlw4C0F8bR/crGBWx7NUHElep1frROYohD15jN/8bFA2/bJ3xFdiH1bMNqHu
MdB4EmRbs4nuNeN/ZayV4JXBVD3oPuwESYA65jVj0MfbVbzeRod6ZnNvpZOb/Juc
7dHCNPa2coLGLakGEQperOvOOCqsTbxSUdagXulW/1xH3iG+8UPNPJe7ATvO0tE3
FYOot3a3WgN8dsWUnsOKBnA17FA2zN0ac/QdEd2COSbpOjbpQp7BIlg0f0QIIkU6
pVq1C706jn5Cl4gKXsfC277Rrx3eLl3YPVA6XaL95PSXBH51L7Y3ViqMmVe9Coo=
=cSUy
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Wed Oct 12 14:04:27 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 12 Oct 2011 11:04:27 -0700
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org> <20111012180002.GC5039@bx9.net>
Message-ID: <20111012180427.GD5039@bx9.net>

We just bought a couple of 64-port 10g switches from Blade, for the
middle of our networking infrastructure. They were the winner over all
the others, lowest price and appropriate features. We also bought
Blade top-of-rack switches. Now that they've been bought up by IBM you
have to negotiate harder to get that low price, but you can still get
it by threatening them with competing quotes.

Gnodal looks very interesting for larger, multi-switch clusters, they
were just a bit late to market for us. Arista really believes that
their high prices are justified; we didn't.

And if anyone would like to buy some used Mellanox 48-port 10ge
switches, we have 2 extras we'd like to sell.

-- greg

On Wed, Oct 12, 2011 at 10:52:13AM -0400, Chris Dagdigian wrote:
> 
> First time I'm seriously pondering bringing 10GbE straight to compute 
> nodes ...
> 
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks 
> what would be the common 10 Gig networking topology be today?
> 
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?
> 
> Regards,
> Chris
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Shainer at Mellanox.com  Wed Oct 12 14:11:04 2011
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Wed, 12 Oct 2011 18:11:04 +0000
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <20111012180427.GD5039@bx9.net>
References: <4E95A99D.9040703@sonsorol.org> <20111012180002.GC5039@bx9.net>
	<20111012180427.GD5039@bx9.net>
Message-ID: <F46B2E61C40ADF4ABD39500BC54C3C7918866232@MTIDAG01.mtl.com>

The 48-ports are not Mellanox but previous company that Mellanox acquired, as the Mellanox ones are 36 x 40G or 64 x 10G in 1U (or bigger). But please don't let these small details hold you from re-living your history.

Good luck selling.

-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Greg Lindahl
Sent: Wednesday, October 12, 2011 11:05 AM
To: Chris Dagdigian
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] 10GbE topologies for small-ish clusters?

We just bought a couple of 64-port 10g switches from Blade, for the middle of our networking infrastructure. They were the winner over all the others, lowest price and appropriate features. We also bought Blade top-of-rack switches. Now that they've been bought up by IBM you have to negotiate harder to get that low price, but you can still get it by threatening them with competing quotes.

Gnodal looks very interesting for larger, multi-switch clusters, they were just a bit late to market for us. Arista really believes that their high prices are justified; we didn't.

And if anyone would like to buy some used Mellanox 48-port 10ge switches, we have 2 extras we'd like to sell.

-- greg

On Wed, Oct 12, 2011 at 10:52:13AM -0400, Chris Dagdigian wrote:
> 
> First time I'm seriously pondering bringing 10GbE straight to compute 
> nodes ...
> 
> For 64 servers (32 to a cabinet) and an HPC system that spans two 
> racks what would be the common 10 Gig networking topology be today?
> 
> - One large core switch?
> - 48 port top-of-rack switches with trunking?
> - Something else?
> 
> Regards,
> Chris
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cap at nsc.liu.se  Thu Oct 13 07:51:56 2011
From: cap at nsc.liu.se (Peter =?iso-8859-1?q?Kjellstr=F6m?=)
Date: Thu, 13 Oct 2011 13:51:56 +0200
Subject: [Beowulf] 10GbE topologies for small-ish clusters?
In-Reply-To: <4E95A99D.9040703@sonsorol.org>
References: <4E95A99D.9040703@sonsorol.org>
Message-ID: <201110131351.59977.cap@nsc.liu.se>

On Wednesday, October 12, 2011 04:52:13 PM Chris Dagdigian wrote:
> First time I'm seriously pondering bringing 10GbE straight to compute
> nodes ...
> 
> For 64 servers (32 to a cabinet) and an HPC system that spans two racks
> what would be the common 10 Gig networking topology be today?

Both Arista and Blade (now IBM) has 64 port 1U single ASIC switches (a few 
ports will require qsfp to sfp+ break out cables afaict).

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111013/491cbdcf/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From prentice at ias.edu  Fri Oct 21 09:10:18 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 09:10:18 -0400
Subject: [Beowulf] Users abusing screen
Message-ID: <4EA16F3A.8080209@ias.edu>

Beowulfers,

I have a question that isn't directly related to clusters, but I suspect
it's an issue many of you are dealing with are dealt with: users using
the screen command to stay logged in on systems and running long jobs
that they forget about. Have any of you experienced this, and how did
you deal with it?

Here's my scenario:

In addition to my cluster, we have a bunch of "computer servers" where
users can run the programs. These are "large" boxes with more cores
(24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
desktop top.

Periodically, when I have to shutdown/reboot a system for maintenance,
I find a LOT of shells being run through the screen command for users
who aren't logged in. The majority are idle shells, but many are running
jobs, that seem to be forgotten about. For example, I recently found
some jobs running since July or August that were running under the
account of someone who hasn't even been here for months!

My opinion is these these are shared resources, and if you aren't
interactively using them, you should log out to free up resources for
others. If you have a job that can be run non-interactively, you should
submit it to the cluster.

Has anyone else here dealt with the problem?

I would like to remove screen from my environment entirely to prevent
this. My fellow sysadmins here agree. I'm expecting massive backlash
from the users.


-- 
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 12:07:27 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 12:07:27 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
	<4EA1854B.5090506@ias.edu>
	<CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
Message-ID: <4EA198BF.3030002@ias.edu>

On 10/21/2011 11:06 AM, Kilian Cavalotti wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>>> Have you thought about queueing systems like condor or SGE?
>>
>> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
>> (non-MPI, etc.) there, so there is no need for them to use screen to
>> execute long-running jobs. Hence my frustration.
> 
> You could alias "screen" to "qlogin". :)

Actually, I can't for reasons I can't get into here. But something like
that was part of my original "master plan".

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 12:10:36 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 12:10:36 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <7B82E572-588E-41A4-9B46-8A1A07360A30@staff.uni-marburg.de>
References: <4EA16F3A.8080209@ias.edu>
	<7B82E572-588E-41A4-9B46-8A1A07360A30@staff.uni-marburg.de>
Message-ID: <4EA1997C.70103@ias.edu>

On 10/21/2011 11:24 AM, Reuti wrote:
> Hi,
> 
> Am 21.10.2011 um 15:10 schrieb Prentice Bisbal:
> 
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
>>
>> Here's my scenario:
>>
>> In addition to my cluster, we have a bunch of "computer servers" where
>> users can run the programs. These are "large" boxes with more cores
>> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
>> desktop top.
>>
>> Periodically, when I have to shutdown/reboot a system for maintenance,
>> I find a LOT of shells being run through the screen command for users
>> who aren't logged in. The majority are idle shells, but many are running
>> jobs, that seem to be forgotten about. For example, I recently found
>> some jobs running since July or August that were running under the
>> account of someone who hasn't even been here for months!
>>
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others. If you have a job that can be run non-interactively, you should
>> submit it to the cluster.
>>
>> Has anyone else here dealt with the problem?
>>
>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> I disallow rsh to the machines and limit ssh to admin staff. Users who want to run something on a machine have to go through the queuing system to get access to a node granted by GridEngine (for the startup method you can use either the -builtin- or [in case you need X11 forwarding] by a different sshd_config and ssh [GridEngine will start one daemon per task], one additional step is necessary for a tight integration of ssh).
> 
> For users just checking their jobs on a node I have a dedicated queue (where they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse it).
> 
> -- Reuti
> 

Reuti,

That was EXACTLY my original plan, but for reasons I don't want to get
into, I can't implement that. In fact, just yesterday I ripped out all
the SGE queues I had configured to that. Why? because I was tired of
seeing them and being reminded of what a good idea it was. :(

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 12:12:53 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 12:12:53 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA19365.4030109@runnersroll.com>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
Message-ID: <4EA19A05.4000400@ias.edu>

On 10/21/2011 11:44 AM, Ellis H. Wilson III wrote:
> On 10/21/11 09:10, Prentice Bisbal wrote:
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
> 
> I think this is strongly tied to what kind of work the users are doing
> (i.e. how interactive it is, how long jobs take, how likely failure is
> to occur that they must react to).  In my personal experience the jobs I
> spawn aren't interactive, tend to take a long time, and because of point
> 2 require me to react pretty quickly to their failure or I lose out on
> valuable compute-time.  However, they are cumbersome to execute via a
> queuing manager (my work is in systems, so perhaps that area is an
> exception).  Therefore what I always do is just nohup myself a job, and
> tail -f it if I need to watch it.  I've adapted my ssh config such that
> I don't get booted off after 5 or 10 minutes without any input from me
> (I think the limit I set is like 2hours or something), so I can watch
> output fly by to my hearts content.
> 
> If I were you, I think the best way to avoid a user-uprising, but to
> achieve your goal is to give instructions on how a user can nohup (yes,
> just assume they don't know how) and how to configure ssh to not die
> after a short time.  This way they don't have to worry about getting
> disconnected if they aren't constantly interacting (so they can watch
> output), but they also aren't staying logged on indefinitely (since
> presumably their laptops/desktops aren't on indefinitely).
> 
> If you give them an alternative that is well defined with an example
> (not just, "Oh you can use such-and-such instead.") I can hardly believe
> they'll be all that upset.
> 

Ellis,

Using nohup was exactly the advice I gave to one of my users yesterday.
Not sure if he'll use it. 'man' is a very difficult program to learn,
from what I understand.

Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Fri Oct 21 11:24:32 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Fri, 21 Oct 2011 17:24:32 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <7B82E572-588E-41A4-9B46-8A1A07360A30@staff.uni-marburg.de>

Hi,

Am 21.10.2011 um 15:10 schrieb Prentice Bisbal:

> Beowulfers,
> 
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?
> 
> Here's my scenario:
> 
> In addition to my cluster, we have a bunch of "computer servers" where
> users can run the programs. These are "large" boxes with more cores
> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
> desktop top.
> 
> Periodically, when I have to shutdown/reboot a system for maintenance,
> I find a LOT of shells being run through the screen command for users
> who aren't logged in. The majority are idle shells, but many are running
> jobs, that seem to be forgotten about. For example, I recently found
> some jobs running since July or August that were running under the
> account of someone who hasn't even been here for months!
> 
> My opinion is these these are shared resources, and if you aren't
> interactively using them, you should log out to free up resources for
> others. If you have a job that can be run non-interactively, you should
> submit it to the cluster.
> 
> Has anyone else here dealt with the problem?
> 
> I would like to remove screen from my environment entirely to prevent
> this. My fellow sysadmins here agree. I'm expecting massive backlash
> from the users.

I disallow rsh to the machines and limit ssh to admin staff. Users who want to run something on a machine have to go through the queuing system to get access to a node granted by GridEngine (for the startup method you can use either the -builtin- or [in case you need X11 forwarding] by a different sshd_config and ssh [GridEngine will start one daemon per task], one additional step is necessary for a tight integration of ssh).

For users just checking their jobs on a node I have a dedicated queue (where they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse it).

-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bug at sas.upenn.edu  Fri Oct 21 11:17:55 2011
From: bug at sas.upenn.edu (Gavin W. Burris)
Date: Fri, 21 Oct 2011 11:17:55 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<20111021134457.GA22748@grml>	<4EA1854B.5090506@ias.edu>
	<CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
Message-ID: <4EA18D23.4050501@sas.upenn.edu>


On 10/21/2011 11:06 AM, Kilian Cavalotti wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>>> Have you thought about queueing systems like condor or SGE?
>>
>> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
>> (non-MPI, etc.) there, so there is no need for them to use screen to
>> execute long-running jobs. Hence my frustration.
> 
> You could alias "screen" to "qlogin". :)
> 
> Cheers,


I think we have a winner. :)

-- 
Gavin W. Burris
Senior Systems Programmer
Information Security and Unix Systems
School of Arts and Sciences
University of Pennsylvania
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Fri Oct 21 11:44:37 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Fri, 21 Oct 2011 11:44:37 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <4EA19365.4030109@runnersroll.com>

On 10/21/11 09:10, Prentice Bisbal wrote:
> Beowulfers,
> 
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?

I think this is strongly tied to what kind of work the users are doing
(i.e. how interactive it is, how long jobs take, how likely failure is
to occur that they must react to).  In my personal experience the jobs I
spawn aren't interactive, tend to take a long time, and because of point
2 require me to react pretty quickly to their failure or I lose out on
valuable compute-time.  However, they are cumbersome to execute via a
queuing manager (my work is in systems, so perhaps that area is an
exception).  Therefore what I always do is just nohup myself a job, and
tail -f it if I need to watch it.  I've adapted my ssh config such that
I don't get booted off after 5 or 10 minutes without any input from me
(I think the limit I set is like 2hours or something), so I can watch
output fly by to my hearts content.

If I were you, I think the best way to avoid a user-uprising, but to
achieve your goal is to give instructions on how a user can nohup (yes,
just assume they don't know how) and how to configure ssh to not die
after a short time.  This way they don't have to worry about getting
disconnected if they aren't constantly interacting (so they can watch
output), but they also aren't staying logged on indefinitely (since
presumably their laptops/desktops aren't on indefinitely).

If you give them an alternative that is well defined with an example
(not just, "Oh you can use such-and-such instead.") I can hardly believe
they'll be all that upset.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From ellis at runnersroll.com  Fri Oct 21 12:26:09 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Fri, 21 Oct 2011 12:26:09 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA19A05.4000400@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
	<4EA19A05.4000400@ias.edu>
Message-ID: <4EA19D21.3090902@runnersroll.com>

On 10/21/11 12:12, Prentice Bisbal wrote:
>> If you give them an alternative that is well defined with an example
>> (not just, "Oh you can use such-and-such instead.") I can hardly believe
>> they'll be all that upset.
>>
> 
> Ellis,
> 
> Using nohup was exactly the advice I gave to one of my users yesterday.
> Not sure if he'll use it. 'man' is a very difficult program to learn,
> from what I understand.

Hahaha, I love your cynicism.  Right up my alley, however, I think in
all seriousness 'man' does fall short for many applications in terms of
examples (there are exceptions to this, but most man docs don't have
examples from my experience).  Many users just want examples of it's
use, and can derive their case faster from such than custom-creation of
a set of parameters from man.

So just take a few moments, cook up an example of 'nohup ./someapp &>
out.txt &' usage and associated ways to kill and watch it's output and
put it all into an email.  Save that email away, and when you're ready
just shoot it out to everyone.  Or if you have an internal wiki setup,
that's much, much better.  Just forward a link to some new page on it.

If you make even a half-assed effort to show you are providing a viable
alternative and a low bar to entry, you'll cut the number of people
complaining at least in half.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Fri Oct 21 11:26:57 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Fri, 21 Oct 2011 17:26:57 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
	<4EA1854B.5090506@ias.edu>
	<CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>
Message-ID: <46778F4F-95ED-4FC7-B936-F8221A759916@staff.uni-marburg.de>

Am 21.10.2011 um 17:06 schrieb Kilian Cavalotti:

> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>>> Have you thought about queueing systems like condor or SGE?
>> 
>> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
>> (non-MPI, etc.) there, so there is no need for them to use screen to
>> execute long-running jobs. Hence my frustration.
> 
> You could alias "screen" to "qlogin". :)

Isn't it to late at that point if I get it right? They login by ssh to an exechost and issue thereon screen to reconnect later. But they should already use qlogin to go to the exechost.

-- Reuti


> Cheers,
> -- 
> Kilian
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Fri Oct 21 12:45:38 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Fri, 21 Oct 2011 09:45:38 -0700
Subject: [Beowulf] about 'man' Re:  Users abusing screen
In-Reply-To: <4EA19A05.4000400@ias.edu>
Message-ID: <CAC6EF55.10A15%james.p.lux@jpl.nasa.gov>


On 10/21/11 9:12 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
>
>Ellis,
>
>Using nohup was exactly the advice I gave to one of my users yesterday.
>Not sure if he'll use it. 'man' is a very difficult program to learn,
>from what I understand.

Well... 'man' is easy, but sometimes, you need decent examples and
tutorials.  Just knowing what all the switches are and the format is like
giving someone a dictionary and saying: now write me a sonnet.  This is
especially so for the "swiss army knife" type utilities (grep, I'm looking
at you!)


>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 21 10:44:27 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 21 Oct 2011 10:44:27 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111021134457.GA22748@grml>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
Message-ID: <4EA1854B.5090506@ias.edu>


On 10/21/2011 09:44 AM, Henning Fehrmann wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
>>
>> Here's my scenario:
>>
>> In addition to my cluster, we have a bunch of "computer servers" where
>> users can run the programs. These are "large" boxes with more cores
>> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
>> desktop top.
>>
>> Periodically, when I have to shutdown/reboot a system for maintenance,
>> I find a LOT of shells being run through the screen command for users
>> who aren't logged in. The majority are idle shells, but many are running
>> jobs, that seem to be forgotten about. For example, I recently found
>> some jobs running since July or August that were running under the
>> account of someone who hasn't even been here for months!
>>
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others. If you have a job that can be run non-interactively, you should
>> submit it to the cluster.
>>
>> Has anyone else here dealt with the problem?
>>
>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> I wouldn't deinstall screen. It is a useful tool for many things and
> there are alternatives doing the same.  Instead one could enforce a
> maximum CPU time a job can take by setting ulimits.
> 
> Have you thought about queueing systems like condor or SGE? 

Yes, I have cluster that uses SGE, and we allow users to run serial jobs
(non-MPI, etc.) there, so there is no need for them to use screen to
execute long-running jobs. Hence my frustration.

Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From kilian.cavalotti.work at gmail.com  Fri Oct 21 11:06:11 2011
From: kilian.cavalotti.work at gmail.com (Kilian Cavalotti)
Date: Fri, 21 Oct 2011 17:06:11 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA1854B.5090506@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <20111021134457.GA22748@grml>
	<4EA1854B.5090506@ias.edu>
Message-ID: <CAJz=VjE8Yo6q05DmjkZ6Qb1ocD1-dapOb4zLuLpkV2tygiNMtg@mail.gmail.com>

Hi Prentice,

On Fri, Oct 21, 2011 at 4:44 PM, Prentice Bisbal <prentice at ias.edu> wrote:
>> Have you thought about queueing systems like condor or SGE?
>
> Yes, I have cluster that uses SGE, and we allow users to run serial jobs
> (non-MPI, etc.) there, so there is no need for them to use screen to
> execute long-running jobs. Hence my frustration.

You could alias "screen" to "qlogin". :)

Cheers,
-- 
Kilian
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From atp at piskorski.com  Fri Oct 21 15:14:01 2011
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 21 Oct 2011 15:14:01 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <20111021191401.GA87390@piskorski.com>

On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:

> My opinion is these these are shared resources, and if you aren't
> interactively using them, you should log out to free up resources for
> others.

"running under screen" != "non-interactive".

> I would like to remove screen from my environment entirely to prevent
> this. My fellow sysadmins here agree. I'm expecting massive backlash
> from the users.

No shit.  If you allow users to login at all, then (IMNSHO) removing
screen is insane.  That's not a solution to your problem, that's
creating a totally new problem and pretending it's a solution.

I essentially always use screen whenever I ssh to any Linux box for
any reason.  If my sysadmin arbitrarily disabled screen because some
other user was doing something dumb, I'd be pretty upset too.
(Annoyed enough to maybe just build screen myself on that box.)

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Fri Oct 21 22:18:19 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 21 Oct 2011 22:18:19 -0400
Subject: [Beowulf] about 'man' Re: Users abusing screen
In-Reply-To: <CAC6EF55.10A15%james.p.lux@jpl.nasa.gov>
References: <4EA19A05.4000400@ias.edu>
	<CAC6EF55.10A15%james.p.lux@jpl.nasa.gov>
Message-ID: <CAF4H3kdHHL+e0rQbmu32VN9Q8zhzB9D7=Ssa78dz5Z0ARJcRcQ@mail.gmail.com>

I'm not a sysadmin, but I thought these days we were supposed to point
[end]users at "help" or "doc" instead of man? Man is like sdb, it's great
but not for everyone, you need context to appreciate it. I think in System V
type derivatives it's usually "help"?
peter

On Fri, Oct 21, 2011 at 12:45 PM, Lux, Jim (337C)
<james.p.lux at jpl.nasa.gov>wrote:

>
>
> On 10/21/11 9:12 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
> >
> >Ellis,
> >
> >Using nohup was exactly the advice I gave to one of my users yesterday.
> >Not sure if he'll use it. 'man' is a very difficult program to learn,
> >from what I understand.
>
> Well... 'man' is easy, but sometimes, you need decent examples and
> tutorials.  Just knowing what all the switches are and the format is like
> giving someone a dictionary and saying: now write me a sonnet.  This is
> especially so for the "swiss army knife" type utilities (grep, I'm looking
> at you!)
>
>
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111021/7494bdc5/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From ellis at runnersroll.com  Sat Oct 22 08:02:35 2011
From: ellis at runnersroll.com (Ellis H. Wilson III)
Date: Sat, 22 Oct 2011 08:02:35 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111021191401.GA87390@piskorski.com>
References: <4EA16F3A.8080209@ias.edu> <20111021191401.GA87390@piskorski.com>
Message-ID: <4EA2B0DB.3040702@runnersroll.com>

On 10/21/11 15:14, Andrew Piskorski wrote:
> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
> 
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others.
> 
> "running under screen" != "non-interactive".

What I think Prentice was pointing out here was more along the lines of:
"non-interactive" >= "running under screen" <= interactive
Where interactivity is more of a spectrum than a != or =.  More
pointedly, he stated his users are acting in a non-interactive manner,
in some cases even after they leave, which is irresponsible at all
levels.  Obviously he has to balance a rule-set between the good users
and the bad users, such that abuse isn't quite as easy.

>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> No shit.  If you allow users to login at all, then (IMNSHO) removing
> screen is insane.  That's not a solution to your problem, that's
> creating a totally new problem and pretending it's a solution.

Insane?  I mean, I do a lot of work on a bunch of different distros and
hardware types, and have found little use for screen /unless/ I was on a
really, really poor internet connection that cut out on the minutes
level.  Can you give some examples regarding something you can do with
screen you cannot do with nohup and tail?

> I essentially always use screen whenever I ssh to any Linux box for
> any reason.

But why?  Just leave a terminal open if you want interactivity,
otherwise nohup something.  Perhaps I've understated screen's
usefulness, but I'm glad to be corrected/educated on it's efficacy in
this area.

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From skylar at cs.earlham.edu  Sat Oct 22 13:24:02 2011
From: skylar at cs.earlham.edu (Skylar Thompson)
Date: Sat, 22 Oct 2011 10:24:02 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA2B0DB.3040702@runnersroll.com>
References: <4EA16F3A.8080209@ias.edu> <20111021191401.GA87390@piskorski.com>
	<4EA2B0DB.3040702@runnersroll.com>
Message-ID: <4EA2FC32.9000605@cs.earlham.edu>

On 10/22/11 05:02, Ellis H. Wilson III wrote:
>
> Insane?  I mean, I do a lot of work on a bunch of different distros and
> hardware types, and have found little use for screen /unless/ I was on a
> really, really poor internet connection that cut out on the minutes
> level.  Can you give some examples regarding something you can do with
> screen you cannot do with nohup and tail?
>
>   

Here's a few I can think of:

* Multiple shells off one login
* Scroll buffer
* Copy&paste w/o needing a mouse
* Start session logging at any time, w/o needing to remember to use
script or nohup

I guess I'm with Andrew, where the first thing I do upon logging in is
either connecting to an existing screen session or starting a fresh one.

-- 
-- Skylar Thompson (skylar at cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111022/ddd35d99/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From j.wender at science-computing.de  Mon Oct 24 02:30:12 2011
From: j.wender at science-computing.de (Jan Wender)
Date: Mon, 24 Oct 2011 08:30:12 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <4EA505F4.7080007@science-computing.de>

On 10/21/2011 03:10 PM, Prentice Bisbal wrote:
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?

How about killing long-running (either elapsed or used time) processes not
started through the batch system? You should be able to identify them by looking
at the process tree.
At least one cluster I know kills all user processes which have not been started
from the queueing system.

Cheerio,
Jan
-- 
---- Company Information ----
Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr.
Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Philippe Miltin
Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: j_wender.vcf
Type: text/x-vcard
Size: 338 bytes
Desc: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111024/96975f63/attachment-0001.vcf>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From greg.matthews at diamond.ac.uk  Mon Oct 24 07:00:19 2011
From: greg.matthews at diamond.ac.uk (Gregory Matthews)
Date: Mon, 24 Oct 2011 12:00:19 +0100
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA19A05.4000400@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
	<4EA19A05.4000400@ias.edu>
Message-ID: <4EA54543.5090908@diamond.ac.uk>

Prentice Bisbal wrote:
> Using nohup was exactly the advice I gave to one of my users yesterday.
> Not sure if he'll use it. 'man' is a very difficult program to learn,
> from what I understand.

our experience of ppl using nohup without really thinking it through is 
eventually filling the partition with an enormous nohup.out file.

GREG

> 
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


-- 
Greg Matthews            01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Oct 24 07:20:02 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 24 Oct 2011 13:20:02 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA54543.5090908@diamond.ac.uk>
References: <4EA16F3A.8080209@ias.edu> <4EA19365.4030109@runnersroll.com>
	<4EA19A05.4000400@ias.edu> <4EA54543.5090908@diamond.ac.uk>
Message-ID: <9DA6F2A5-6736-457F-AE89-C5EC56735C09@staff.uni-marburg.de>

Am 24.10.2011 um 13:00 schrieb Gregory Matthews:

> Prentice Bisbal wrote:
>> Using nohup was exactly the advice I gave to one of my users yesterday.
>> Not sure if he'll use it. 'man' is a very difficult program to learn,
>> from what I understand.
> 
> our experience of ppl using nohup without really thinking it through is 
> eventually filling the partition with an enormous nohup.out file.

It's possible to make an alias, so that "nohup" reads "nohup > /dev/null"

The redirection doesn't need to be at the end of the command. Depends whether they need the output, and/or any output file is created by the application on its own anyway.

-- Reuti


> GREG
> 
>> 
>> Prentice
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>> 
> 
> 
> -- 
> Greg Matthews            01235 778658
> Senior Computer Systems Administrator
> Diamond Light Source, Oxfordshire, UK
> 
> -- 
> This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Mon Oct 24 09:42:23 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 24 Oct 2011 09:42:23 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA2B0DB.3040702@runnersroll.com>
References: <4EA16F3A.8080209@ias.edu> <20111021191401.GA87390@piskorski.com>
	<4EA2B0DB.3040702@runnersroll.com>
Message-ID: <4EA56B3F.3060404@ias.edu>

On 10/22/2011 08:02 AM, Ellis H. Wilson III wrote:
> On 10/21/11 15:14, Andrew Piskorski wrote:
>> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
>>
>>> My opinion is these these are shared resources, and if you aren't
>>> interactively using them, you should log out to free up resources for
>>> others.
>> "running under screen" != "non-interactive".
> What I think Prentice was pointing out here was more along the lines of:
> "non-interactive" >= "running under screen" <= interactive
> Where interactivity is more of a spectrum than a != or =.  More
> pointedly, he stated his users are acting in a non-interactive manner,
> in some cases even after they leave, which is irresponsible at all
> levels.  Obviously he has to balance a rule-set between the good users
> and the bad users, such that abuse isn't quite as easy.

Thanks for coming to my defense, Ellis. I don't think I could have
explained it better myself.

>>> I would like to remove screen from my environment entirely to prevent
>>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>>> from the users.
>> No shit.  If you allow users to login at all, then (IMNSHO) removing
>> screen is insane.  That's not a solution to your problem, that's
>> creating a totally new problem and pretending it's a solution.
> Insane?  I mean, I do a lot of work on a bunch of different distros and
> hardware types, and have found little use for screen /unless/ I was on a
> really, really poor internet connection that cut out on the minutes
> level.  Can you give some examples regarding something you can do with
> screen you cannot do with nohup and tail?

I agree.  I've been a professional sys admin using Unix/Linux day in and
day out for well over 10 years, and not one days has gone by where I saw
a need for screen.
 
>> I essentially always use screen whenever I ssh to any Linux box for
>> any reason.
> But why?  Just leave a terminal open if you want interactivity,
> otherwise nohup something.  Perhaps I've understated screen's
> usefulness, but I'm glad to be corrected/educated on it's efficacy in
> this area.
>
> Best,
>
> ellis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Mon Oct 24 09:46:49 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 24 Oct 2011 09:46:49 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA505F4.7080007@science-computing.de>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
Message-ID: <4EA56C49.9060204@ias.edu>


On 10/24/2011 02:30 AM, Jan Wender wrote:
> On 10/21/2011 03:10 PM, Prentice Bisbal wrote:
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
> How about killing long-running (either elapsed or used time) processes not
> started through the batch system? You should be able to identify them by looking
> at the process tree.
> At least one cluster I know kills all user processes which have not been started
> from the queueing system.

The systems where screen is being abused are not part of the batch
system, and they will not /can not be for reasons I don't want to get
into here. The problem with killing long-running programs is that there
are often long running programs that are legitimate in my evironment. I
can quickly scan 'ps' output and determine which is which, but I doubt
that kind of intelligence could ever be built into a shell script.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Mon Oct 24 10:22:50 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Mon, 24 Oct 2011 10:22:50 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAE=SCg4f5s+4abZu63N=X+pqKx9xd4TjuAA8UAqrhzC0C6yAGw@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<CAE=SCg4f5s+4abZu63N=X+pqKx9xd4TjuAA8UAqrhzC0C6yAGw@mail.gmail.com>
Message-ID: <4EA574BA.2050304@ias.edu>

Anything is possible if you're a good enough programmer. Like I said
earlier, there are some users legitimately running long jobs on the
systems in question. Instead of developing a clever program to
automatically kill long running screen jobs, I think it would be better
to be up front with my users and remove screen, rather than let them use
it, only to surprise them later by killing their jobs.


On 10/24/2011 09:55 AM, geert geurts wrote:
>
> Hello Prentice,
>
> Screen is a essential app, for sure.
> But as an answer to the initial question...
> I'm not much of a programmer, but can't you replace the binary with a
> custom compiled version which runs two threads? One with the initial
> program, and one which sleeps for the maximum amount of time you're
> willing to allow screen sessions to last, and kills the session when
> the time runs out...
>
> Or maybe build some script around the actual binary to do the same..
>
>
> Regards,
> Geert
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Mon Oct 24 18:48:44 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Tue, 25 Oct 2011 09:48:44 +1100
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <4EA5EB4C.3000809@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 22/10/11 00:10, Prentice Bisbal wrote:

> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?

Hmm, any way of making a local version of screen which
puts all the processes into a cpuset or control group
so you can easily distinguish between ones in screen
and outside of it ? Perhaps even doing it with a wrapper
if you didn't want to build a modified version ?

That way you get to restrict the number of cores they
can monopolise..

Of course a user could get around it by building their
own copy, but at least then you'd be able to see that..

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6l60wACgkQO2KABBYQAh/YtwCfegBzvEpH/s4PtHnFlEwSqQLK
UO8An3DK20lEVrT9WM8qln0wM7alKoU6
=oInQ
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Tue Oct 25 19:13:05 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Tue, 25 Oct 2011 16:13:05 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA56C49.9060204@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu>
Message-ID: <20111025231305.GC9493@bx9.net>

On Mon, Oct 24, 2011 at 09:46:49AM -0400, Prentice Bisbal wrote:

> The systems where screen is being abused are not part of the batch
> system, and they will not /can not be for reasons I don't want to get
> into here. The problem with killing long-running programs is that there
> are often long running programs that are legitimate in my evironment. I
> can quickly scan 'ps' output and determine which is which, but I doubt
> that kind of intelligence could ever be built into a shell script.

I see that you didn't bother to check out the software proposed soon
after you asked your question. If you don't check out potential
answers because you doubt they will work, why should anyone bother to
reply to you?

The problem you have is a common issue in university environments, and
the common solution is a script that accurately figures out
long-running cpu-intensive programs and nices/kills them. I first ran
into such a thing in, oh, 1992? It's not rocket science.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Wed Oct 26 10:31:56 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Wed, 26 Oct 2011 10:31:56 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111025231305.GC9493@bx9.net>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
Message-ID: <4EA819DC.9090106@ias.edu>

On 10/25/2011 07:13 PM, Greg Lindahl wrote:
> On Mon, Oct 24, 2011 at 09:46:49AM -0400, Prentice Bisbal wrote:
>
>> The systems where screen is being abused are not part of the batch
>> system, and they will not /can not be for reasons I don't want to get
>> into here. The problem with killing long-running programs is that there
>> are often long running programs that are legitimate in my evironment. I
>> can quickly scan 'ps' output and determine which is which, but I doubt
>> that kind of intelligence could ever be built into a shell script.
> I see that you didn't bother to check out the software proposed soon
> after you asked your question. If you don't check out potential
> answers because you doubt they will work, why should anyone bother to
> reply to you?

Greg,

I didn't realize I needed to log a detailed response to every suggestion
made to me on this list. I've been a member of this list for quite
sometime, and I've never seen a comment like yours before. You're out of
line.

People should bother to reply to me because I've been a participating
member of this list for 4 years now, and often assist others when I can.
I don't expect a response to every suggestion I provide to others.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bcostescu at gmail.com  Wed Oct 26 11:41:50 2011
From: bcostescu at gmail.com (Bogdan Costescu)
Date: Wed, 26 Oct 2011 17:41:50 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA16F3A.8080209@ias.edu>
References: <4EA16F3A.8080209@ias.edu>
Message-ID: <CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>

On Fri, Oct 21, 2011 at 15:10, Prentice Bisbal <prentice at ias.edu> wrote:
> Periodically, when I have to shutdown/reboot a system for maintenance,
> I find a LOT of shells being run through the screen command for users
> who aren't logged in. The majority are idle shells, but many are running
> jobs, that seem to be forgotten about.
> ...
> I would like to remove screen from my environment entirely to prevent
> this.

>From what I understand from your message, it's not screen per-se which
upsets you, it's the way it is (ab)used by some users to start long
running memory hogging jobs; you seem to be OK with idle shells found
at maintenance time which are still started through screen. So why the
backlash against screen ?

Starting jobs in the background can be done directly through the
shell, with no screen; if the job can be split in smaller pieces
time-wise, they can be started by at/cron; screen can be installed by
a user, possible under a different name... so many and surely other
possibilities to still upset you even if you uninstall screen, because
you focus on the wrong subject. To deal with forgotten long running
jobs, you have various administrative (f.e. bill users/groups, even if
in some kind of symbolic way) or technical (f.e. only allow 24h CPU
time through system-wide limits or install a daemon which watches and
warns and/or takes measures) means - some of these have been discussed
on this very list in the past or have been mentioned earlier in this
thread. Each situation is different (f.e. some legitimate jobs could
run for more than 24h), so you should check all suggestions and apply
the one(s) which fit(s) best. I know from my own experience that it's
not easy to be on this side of the fence :-)

Good luck!
Bogdan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Oct 26 12:22:31 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 26 Oct 2011 12:22:31 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1110261210450.25447@lilithnew>

OK, OK, I haven't participated in this discussion so far -- way too
busy.  But since it keeps on going, and going, and going, and since
nobody has mentioned the obvious and permanent solution, I'm going to
have to bring it up:

>From "man 8 syslogd", which alas seems to no longer exist save in our
hearts and memories, when confronted with any sort of persistent system
abuse:

5. Use step 4 and if the problem persists and is not secondary to a
    rogue program/daemon get a 3.5 ft (approx. 1 meter) length of
    sucker rod* and have a chat with the user in question.

*  Sucker rod def.  ?  3/4, 7/8 or 1in. hardened steel rod, male
    threaded on each end.  Primary use in the oil industry in West-
    ern North Dakota and other locations to pump 'suck' oil from oil
    wells.  Secondary uses are for the construction of cattle feed
    lots and for dealing with the occasional recalcitrant or bel-
    ligerent individual.

I've found that the "sucker rod solution" is really the only one that
ultimately works.  Even if it is merely present when discussing the
problem with the worst offenders, it marvelously focusses the mind on
the severity of the issue.

Otherwise (as has been pointed out repeatedly) it is rather trivial to
write an e.g. cron script that reaps/kills ANYTHING undesireable on a
public server.  Invariably they will sooner or later kill something that
shouldn't be killed in the sense that it is doing some sort of useful
work, but screen isn't likely to be something in that category.

Myself, I like the sucker rod approach.  BANG down on the desk with it
and say something ominous like "So, you've been cluttering up my server
with unattended and abandoned sessions.  Would you be so kind as to
CEASE (bam) and DESIST (bam) from this antisocial activity?"  Then
mutter something about too much Jolt Cola and back away slowly.

Don't worry too much about the divots you leave in the desk or the
coffee mug that somehow got shattered.  They'll be useful reminders the
next time he or she considers walking way from a multiplexed screen
session.

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Wed Oct 26 12:42:50 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 26 Oct 2011 12:42:50 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110261210450.25447@lilithnew>
References: <4EA16F3A.8080209@ias.edu>
	<CAH+WPdvpnOAs59p7W57+JYFh3h_71R1fLqJ6fCBh7y7nPLCC=Q@mail.gmail.com>
	<alpine.LFD.2.02.1110261210450.25447@lilithnew>
Message-ID: <4EA8388A.6060704@scalableinformatics.com>

On 10/26/2011 12:22 PM, Robert G. Brown wrote:

> Myself, I like the sucker rod approach.  BANG down on the desk with it
> and say something ominous like "So, you've been cluttering up my server
> with unattended and abandoned sessions.  Would you be so kind as to
> CEASE (bam) and DESIST (bam) from this antisocial activity?"  Then
> mutter something about too much Jolt Cola and back away slowly.

[donning his old New Yawk accent ... "Hey, we don't gots no accent ... 
you'se got an accent..."]

"Thats a nice computer model you have there perfesser ... be a shame to 
have to run it over ... TCP over SLIP (serial line IP) ..."

"So you like that 64 bit math, eh?  Lets see how well you compute with a 
few less bits ..."

[back to your regularly scheduled supercomputer cluster]


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Wed Oct 26 16:55:13 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 26 Oct 2011 16:55:13 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA819DC.9090106@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
Message-ID: <alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>

> sometime, and I've never seen a comment like yours before. You're out of
> line.

hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)

seriously, your question seemed to be about a general problem,
but your motive, ulterior or not, seemed to be to get rid of screen.

IMO, getting rid of screen is BOFHishness of the first order.
it's a tool that has valuable uses.  it's not the cause of your problem.

on our login nodes, we have some basic limits (/etc/security/limit.conf)
that prevent large or long processes or numerous processes.

* hard as 3000000
* hard cpu 60
* hard nproc 100
* hard maxlogins 20

these are very arguable, and actually pretty loose.  our login nodes are
intended for editing/compiling/submitting, maybe the occasional gnuplot/etc.
there doesn't seem to be much resistance to the 3G as (vsz) limit, and 
it does definitely cut down on OOM problems.  60 cpu-minutes covers any
possible compile/etc (though it has caused problems with people trying to
do very large scp operations.)  nproc could probably be much lower (20?)
and maxlogins ought to be more like 5.

we don't currently have an idle-process killer, though have thought of it.
we only recently put a default TMOUT in place to cause a bit of gc on 
forgotten login sessions.

we do have screen installed (I never use it myself.)

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From scrusan at ur.rochester.edu  Wed Oct 26 17:14:13 2011
From: scrusan at ur.rochester.edu (Steve Crusan)
Date: Wed, 26 Oct 2011 17:14:13 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
Message-ID: <B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 26, 2011, at 4:55 PM, Mark Hahn wrote:

>> sometime, and I've never seen a comment like yours before. You're out of
>> line.
> 
> hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)
> 
> seriously, your question seemed to be about a general problem,
> but your motive, ulterior or not, seemed to be to get rid of screen.
> 
> IMO, getting rid of screen is BOFHishness of the first order.
> it's a tool that has valuable uses.  it's not the cause of your problem.


I agree. 

- From reading this thread, the original machine(s) in question seem to be some sort of interactive or login node(s). If these nodes were large memory or SMP machines, we'd have our resource manager take care of long running processes or other abuses. 


> 
> on our login nodes, we have some basic limits (/etc/security/limit.conf)
> that prevent large or long processes or numerous processes.
> 
> * hard as 3000000
> * hard cpu 60
> * hard nproc 100
> * hard maxlogins 20
> 
> these are very arguable, and actually pretty loose.  our login nodes are
> intended for editing/compiling/submitting, maybe the occasional gnuplot/etc.
> there doesn't seem to be much resistance to the 3G as (vsz) limit, and 
> it does definitely cut down on OOM problems.  60 cpu-minutes covers any
> possible compile/etc (though it has caused problems with people trying to
> do very large scp operations.)  nproc could probably be much lower (20?)
> and maxlogins ought to be more like 5.


We actually just spinned up a graphical login node for our less saavy users whom are more apt to run matlab, comsol, gnuplot, and other 'EZ button' graphically based scientific software. This graphical login software (http://code.google.com/p/neatx/) has helped us a lot with novice users. It has session resumption, client software for any platforms, it's faster than xforwarding, and it's wrapped around SSH. 

The node itself is 'fairly' heavy (8 procs, 72GB of RAM), but we've implemented cgroups to stop abuses. Upon login (through SSH or NX) each user is added to his own control group, which has processor and memory limits. Since the user's processes are kept inside of control group process spaces, it's easy to work directly with their processes/process trees, whether it be dynamic throttling, or just killing processes.   

 On our login nodes that don't use control groups, we just kill any heavy computational processes after a certain period of time, depending on whether or not it's a compilation step, gzip, etc. We state this in our documentation, and usually give the user a warning+grace period. We don't see this type of abuse anymore because the few users whom have done this quickly learned (and apologized, imagine that!), or they were using our cgroup setup login node, so their abuse didn't affect the system enough.

 If the issue is processes that run for far too long, and are abusing the system, cgroups or 'pushing' the users to use a batch system seems to work better than writing scripts to make decisions on killing processes. Most ISVs have methods to run computation in batch mode, so it's not necessary for matlab type users to have their applications running for 3 weeks in a screen session when they could be using the cluster.

Either that, or using some sort of cpu/memory limits that were listed above, or cgroups. So a process can run forever, but it won't have enough CPU/memory shares to make a difference.

Just my .02

> 
> we don't currently have an idle-process killer, though have thought of it.
> we only recently put a default TMOUT in place to cause a bit of gc on 
> forgotten login sessions.
> 
> we do have screen installed (I never use it myself.)
> 
> regards, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOqHgzAAoJENS19LGOpgqKDHQH/AqfAefrt3nusElS/OBnxgBK
Pf8tFuyjoJvLgt+3KX19ZL18r1b/BhdW3/1GZgSVVjQZcYkV6dtUq6VI545jqDag
lRY9kvyIhudKfVhFwGa87DbXSzYv5oDImf3UejsIiJvo20Bzxf7mdpToT+AGJ4gA
J2HzrZwjdZk/DYEJ7CpG9lfthDDq5mrTQTbzVCnFHvEiWpeoBvfd3gJOP94age0F
0ZQGLCgheRSJXLsOlq0y0vqr+7nzupSrLUk5A1YcUysSpk4Dc4mvUVJFE+QbStN6
dSiYHhKMxF5qJTXYOSAF4QDmIObyzlbFFmHCeTTWrCG7KeWtOZU4zUfN7TL3sO4=
=M5Pw
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Thu Oct 27 01:41:47 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 26 Oct 2011 22:41:47 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
Message-ID: <20111027054147.GB29939@bx9.net>

On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote:

> If the issue is processes that run for far too long, and are abusing
> the system, cgroups or 'pushing' the users to use a batch system seems
> to work better than writing scripts to make decisions on killing
> processes.

What I saw work well was nicing the process after a certain time,
including an email, and then killing and emailing after a longer
time. The emails can push the batch alternative. Users generally don't
become angry if the limits are enforced by a script; they can only be
surprised once, and that first time is just nicing the process. If
they have a hard time predicting runtime (a common issue, especially
for non-hardcore supercomputing types), it's not like they
_intentionally_ are exceeding the limits...

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Oct 27 10:49:51 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 27 Oct 2011 10:49:51 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111027054147.GB29939@bx9.net>
References: <4EA16F3A.8080209@ias.edu>
	<4EA505F4.7080007@science-computing.de>	<4EA56C49.9060204@ias.edu>
	<20111025231305.GC9493@bx9.net>	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net>
Message-ID: <4EA96F8F.1010207@ias.edu>


On 10/27/2011 01:41 AM, Greg Lindahl wrote:
> On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote:
>
>> If the issue is processes that run for far too long, and are abusing
>> the system, cgroups or 'pushing' the users to use a batch system seems
>> to work better than writing scripts to make decisions on killing
>> processes.
> What I saw work well was nicing the process after a certain time,
> including an email, and then killing and emailing after a longer
> time. The emails can push the batch alternative. Users generally don't
> become angry if the limits are enforced by a script; they can only be
> surprised once, and that first time is just nicing the process. If
> they have a hard time predicting runtime (a common issue, especially
> for non-hardcore supercomputing types), it's not like they
> _intentionally_ are exceeding the limits...

Exactly. That's why I don't want to automate killing jobs longer than X
days.

Honestly, I can't believe how much controversy this discussion has
created. I thought my OP would go unnoticed. Next time, I'll just ask
which text editor I should use. ;)

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From dnlombar at ichips.intel.com  Thu Oct 27 12:04:21 2011
From: dnlombar at ichips.intel.com (David N. Lombard)
Date: Thu, 27 Oct 2011 09:04:21 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
Message-ID: <20111027160421.GA28306@nlxcldnl2.cl.intel.com>

On Wed, Oct 26, 2011 at 02:55:13PM -0600, Mark Hahn wrote:
> > sometime, and I've never seen a comment like yours before. You're out of
> > line.
> 
> hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)
> 
> seriously, your question seemed to be about a general problem,
> but your motive, ulterior or not, seemed to be to get rid of screen.
> 
> IMO, getting rid of screen is BOFHishness of the first order.
> it's a tool that has valuable uses.  it's not the cause of your problem.

Completely agree with this. If you get rid of screen, another tool will
be used, perhaps even as simple as a private copy, or nohup and tail as
others suggested.

My primary use of screen is to do work across home and the office. Nohup
only solves one of the potential scenarios. If screen were removed, my
productivity would go down.

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From glykos at mbg.duth.gr  Thu Oct 27 15:19:37 2011
From: glykos at mbg.duth.gr (Nicholas M Glykos)
Date: Thu, 27 Oct 2011 22:19:37 +0300 (EEST)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA96F8F.1010207@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
Message-ID: <Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>


> Exactly. That's why I don't want to automate killing jobs longer than X
> days.

Probably irrelevant after so many suggestions, but Caos NSA had this very 
nice 'pam_slurm' module which allows a user to login only to those nodes 
on which the said user has active jobs (allocated through slurm). The 
principal idea ["you are welcome to be bring your allocated node (and, 
thus, your job) to a halt if that's what you want"], sounds pedagogically 
attractive ... ;-)

Nicholas


-- 


          Dr Nicholas M. Glykos, Department of Molecular Biology
     and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Thu Oct 27 15:33:18 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 27 Oct 2011 15:33:18 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
Message-ID: <4EA9B1FE.8090903@ias.edu>

On 10/27/2011 03:19 PM, Nicholas M Glykos wrote:
>
>> Exactly. That's why I don't want to automate killing jobs longer than X
>> days.
> Probably irrelevant after so many suggestions, but Caos NSA had this very 
> nice 'pam_slurm' module which allows a user to login only to those nodes 
> on which the said user has active jobs (allocated through slurm). The 
> principal idea ["you are welcome to be bring your allocated node (and, 
> thus, your job) to a halt if that's what you want"], sounds pedagogically 
> attractive ... ;-)
>
>

This doesn't apply to my case, since access to the systems in question
isn't controlled by a queuing system. That alone would fix the problem.

 I think there's a similar pam module for SGE, too.

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Thu Oct 27 15:43:59 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Thu, 27 Oct 2011 21:43:59 +0200
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EA9B1FE.8090903@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<4EA9B1FE.8090903@ias.edu>
Message-ID: <94F21C03-C8BB-4DB4-AA3A-D1271524E43E@staff.uni-marburg.de>

Am 27.10.2011 um 21:33 schrieb Prentice Bisbal:

> On 10/27/2011 03:19 PM, Nicholas M Glykos wrote:
>> 
>>> Exactly. That's why I don't want to automate killing jobs longer than X
>>> days.
>> Probably irrelevant after so many suggestions, but Caos NSA had this very 
>> nice 'pam_slurm' module which allows a user to login only to those nodes 
>> on which the said user has active jobs (allocated through slurm). The 
>> principal idea ["you are welcome to be bring your allocated node (and, 
>> thus, your job) to a halt if that's what you want"], sounds pedagogically 
>> attractive ... ;-)

They use it in one cluster with Slurm I have access to. But it looks like you are never thrown out again once you are in.

-- Reuti


> This doesn't apply to my case, since access to the systems in question
> isn't controlled by a queuing system. That alone would fix the problem.
> 
> I think there's a similar pam module for SGE, too.
> 
> --
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From hahn at mcmaster.ca  Thu Oct 27 19:37:29 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 27 Oct 2011 19:37:29 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
Message-ID: <alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>

> nice 'pam_slurm' module which allows a user to login only to those nodes
> on which the said user has active jobs (allocated through slurm). The

I think this is slightly BOFHish, too.  do people actually have problems
with users stealing cycles this way?  the issue is actually stealing,
and we simply tell our users not to steal.  (actually, I don't think we 
even point it out, since it's so obvious!)

that means we don't attempt to control (we had pam_slurm installed and
actually removed it.)  after all, just because a user's job is done, it
doesn't mean the user has no reason to go onto that node (maybe there's a
status file in /tmp, or a core dump or something.)

if someone persisted in stealing cycles, we'd lock their account.

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From skylar at cs.earlham.edu  Thu Oct 27 19:43:24 2011
From: skylar at cs.earlham.edu (Skylar Thompson)
Date: Thu, 27 Oct 2011 16:43:24 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu>
	<4EA505F4.7080007@science-computing.de>	<4EA56C49.9060204@ias.edu>
	<20111025231305.GC9493@bx9.net>	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>	<20111027054147.GB29939@bx9.net>
	<4EA96F8F.1010207@ias.edu>	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
Message-ID: <4EA9EC9C.9090307@cs.earlham.edu>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/27/2011 04:37 PM, Mark Hahn wrote:
>> nice 'pam_slurm' module which allows a user to login only to those nodes
>> on which the said user has active jobs (allocated through slurm). The
>
> I think this is slightly BOFHish, too. do people actually have problems
> with users stealing cycles this way? the issue is actually stealing,
> and we simply tell our users not to steal. (actually, I don't think we
> even point it out, since it's so obvious!)
>
> that means we don't attempt to control (we had pam_slurm installed and
> actually removed it.) after all, just because a user's job is done, it
> doesn't mean the user has no reason to go onto that node (maybe there's a
> status file in /tmp, or a core dump or something.)
>
> if someone persisted in stealing cycles, we'd lock their account.
>

We do the equivalent with GE it if the end user requests it. We have
some clusters that need to support a mix of critical jobs supporting
data pipelines, and less-critical academic work. Our default stance,
though, is to trust our users to do the right thing. Mostly it works,
but sometimes we do need to bring out the LART stick.

- -- 
- --
- -- Skylar Thompson (skylar at cs.earlham.edu)
- -- http://www.cs.earlham.edu/~skylar/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6p7JwACgkQsc4yyULgN4aRdgCbB3er3VI9OZEVSWO0GjL15rgU
Z0sAoIZBKFsCeaYwA44uQT13JcdMN3dz
=ervm
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Oct 28 14:04:02 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 28 Oct 2011 14:04:02 -0400 (EDT)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
Message-ID: <alpine.LFD.2.02.1110281400350.12011@lilithnew>

On Thu, 27 Oct 2011, Mark Hahn wrote:

> if someone persisted in stealing cycles, we'd lock their account.

Exactly.  Or visit them with a sucker rod.  Or have a department chair
have a "talk" with them.

Human to human interactions and controls work better than installing
complex tools or automated constraints.  Sure, sucker rods are a joke
and no we don't actually bop users on the head or the desk or whomp them
upside the head with a manual, but in most cases a stern talking to
followed by locking their account unless/until they formally agree to
change their ways is more than sufficient.

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From sabujp at gmail.com  Fri Oct 28 14:22:03 2011
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Fri, 28 Oct 2011 13:22:03 -0500
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110281400350.12011@lilithnew>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
Message-ID: <CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>

> Human to human interactions and controls work better than installing
> complex tools or automated constraints. ?Sure, sucker rods are a joke
> and no we don't actually bop users on the head or the desk or whomp them
> upside the head with a manual, but in most cases a stern talking to
> followed by locking their account unless/until they formally agree to
> change their ways is more than sufficient.

Funny you should mentioned that, we've got such a device handy, passed
down through the years from previous sysadmins:

http://i.imgur.com/G0pjk.jpg

It's also got a nice foam layer on the bopping side.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From beckerjes at mail.nih.gov  Fri Oct 28 14:27:48 2011
From: beckerjes at mail.nih.gov (Jesse Becker)
Date: Fri, 28 Oct 2011 14:27:48 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
References: <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
Message-ID: <20111028182748.GC41282@mail.nih.gov>

On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
>http://i.imgur.com/G0pjk.jpg
>
>It's also got a nice foam layer on the bopping side.

Then it's just a prop.  What's the *real* one look like?

-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From sabujp at gmail.com  Fri Oct 28 14:33:52 2011
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Fri, 28 Oct 2011 13:33:52 -0500
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111028182748.GC41282@mail.nih.gov>
References: <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
	<20111028182748.GC41282@mail.nih.gov>
Message-ID: <CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>

I don't know, maybe we drop this on their head:

http://i.imgur.com/VWxyF.jpg

or worse, switch out their linux workstation with it.

On Fri, Oct 28, 2011 at 1:27 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
> On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
>>
>> http://i.imgur.com/G0pjk.jpg
>>
>> It's also got a nice foam layer on the bopping side.
>
> Then it's just a prop. ?What's the *real* one look like?
>
> --
> Jesse Becker
> NHGRI Linux support (Digicon Contractor)
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Fri Oct 28 14:58:33 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Fri, 28 Oct 2011 11:58:33 -0700
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
References: <20111025231305.GC9493@bx9.net> <4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>
	<20111028182748.GC41282@mail.nih.gov>
	<CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0110104ABC1A@ALTPHYEMBEVSP20.RES.AD.JPL>

Google "Microsoft we share your pain" and look for the WSYP videos on youtube..  The three minute version is probably the one you want.


Jim Lux
+1(818)354-2075 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sabuj Pattanayek
> Sent: Friday, October 28, 2011 11:34 AM
> To: Beowulf Mailing List
> Subject: Re: [Beowulf] Users abusing screen
> 
> I don't know, maybe we drop this on their head:
> 
> http://i.imgur.com/VWxyF.jpg
> 
> or worse, switch out their linux workstation with it.
> 
> On Fri, Oct 28, 2011 at 1:27 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
> > On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
> >>
> >> http://i.imgur.com/G0pjk.jpg
> >>
> >> It's also got a nice foam layer on the bopping side.
> >
> > Then it's just a prop. ?What's the *real* one look like?
> >
> > --
> > Jesse Becker
> > NHGRI Linux support (Digicon Contractor)
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From glykos at mbg.duth.gr  Fri Oct 28 15:10:18 2011
From: glykos at mbg.duth.gr (Nicholas M Glykos)
Date: Fri, 28 Oct 2011 22:10:18 +0300 (EEST)
Subject: [Beowulf] Users abusing screen
In-Reply-To: <alpine.LFD.2.02.1110281400350.12011@lilithnew>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net> <4EA96F8F.1010207@ias.edu>
	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>
	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>
	<alpine.LFD.2.02.1110281400350.12011@lilithnew>
Message-ID: <Pine.LNX.4.62.1110282151540.8545@aspera.cluster.mbg.gr>


> > if someone persisted in stealing cycles, we'd lock their account.
>
> Exactly.  Or visit them with a sucker rod.  Or have a department chair
> have a "talk" with them.
> 
> Human to human interactions and controls work better than installing
> complex tools or automated constraints.
  
I can't, of course, even contemplate the possibility of disagreeing with 
RGB. Having said that, we (humans) do install complex tools and automated 
constraints on each and every technologically advanced piece of equipment, 
from cars and aircrafts, to computing machines (and we do not assume that 
proper training and human interaction suffices to guarantee proper 
operation of the said equipment). In this respect, methods like allocating 
(in a controlled manner) exclusive rights to compute nodes do appear 
sensible. I agree that installing restraints is a balancing act between 
crippling creativity (and making power users mad) and avoiding equipment 
misuse, but clearly, there are limits in the freedom of use (for example, 
you wouldn't add all cluster users to your sudo list).
          
My twocents,
Nicholas  


-- 


          Dr Nicholas M. Glykos, Department of Molecular Biology
     and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From prentice at ias.edu  Fri Oct 28 16:20:41 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 28 Oct 2011 16:20:41 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
References: <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>	<20111027054147.GB29939@bx9.net>
	<4EA96F8F.1010207@ias.edu>	<Pine.LNX.4.62.1110272209070.10834@aspera.cluster.mbg.gr>	<alpine.LFD.2.02.1110271932190.17789@coffee.psychology.mcmaster.ca>	<alpine.LFD.2.02.1110281400350.12011@lilithnew>	<CAEeMGHtb6sofTgp4zCHXPHU3tjUj1FTAJKnZrY=-A=obvBhnPg@mail.gmail.com>	<20111028182748.GC41282@mail.nih.gov>
	<CAEeMGHuprbKpT=f0o=HCK9w9U0L0AyEo-S_asu_oMfk0zGtLXA@mail.gmail.com>
Message-ID: <4EAB0E99.10407@ias.edu>

I was still supporting those only 4 years ago. Much heavier than a Dell
or HP workstation. Will fix 'layer 8' problems in a jiffy.

--
Prentice

On 10/28/2011 02:33 PM, Sabuj Pattanayek wrote:
> I don't know, maybe we drop this on their head:
>
> http://i.imgur.com/VWxyF.jpg
>
> or worse, switch out their linux workstation with it.
>
> On Fri, Oct 28, 2011 at 1:27 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
>> On Fri, Oct 28, 2011 at 02:22:03PM -0400, Sabuj Pattanayek wrote:
>>> http://i.imgur.com/G0pjk.jpg
>>>
>>> It's also got a nice foam layer on the bopping side.
>> Then it's just a prop.  What's the *real* one look like?
>>
>> --
>> Jesse Becker
>> NHGRI Linux support (Digicon Contractor)
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Fri Oct 28 16:56:49 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 28 Oct 2011 16:56:49 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <20111027054147.GB29939@bx9.net>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net>
Message-ID: <CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>

I think Greg is right on the money. Particularly at a place like IAS, where
resources are good and users may be errant but are doing great things, I'd
have a sequence of limits; first, a mail warning ("Your job PID 666 has
consumed one million core hours, and its priority will be decremented in
500,000 CH unless you call the sysadmin at 555-1212") and later nice (iwith
another email warning) and only then kill (with an email notificiation). If
they have opportunities to upscale the allocations to really important jobs,
and they are notified about automatic limitations ahead of time, they have
no reason to complain.
Peter

On Thu, Oct 27, 2011 at 1:41 AM, Greg Lindahl <lindahl at pbm.com> wrote:

> On Wed, Oct 26, 2011 at 05:14:13PM -0400, Steve Crusan wrote:
>
> > If the issue is processes that run for far too long, and are abusing
> > the system, cgroups or 'pushing' the users to use a batch system seems
> > to work better than writing scripts to make decisions on killing
> > processes.
>
> What I saw work well was nicing the process after a certain time,
> including an email, and then killing and emailing after a longer
> time. The emails can push the batch alternative. Users generally don't
> become angry if the limits are enforced by a script; they can only be
> surprised once, and that first time is just nicing the process. If
> they have a hard time predicting runtime (a common issue, especially
> for non-hardcore supercomputing types), it's not like they
> _intentionally_ are exceeding the limits...
>
> -- greg
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111028/5a14781a/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From prentice at ias.edu  Fri Oct 28 18:21:50 2011
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 28 Oct 2011 18:21:50 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>
References: <4EA16F3A.8080209@ias.edu>
	<4EA505F4.7080007@science-computing.de>	<4EA56C49.9060204@ias.edu>
	<20111025231305.GC9493@bx9.net>	<4EA819DC.9090106@ias.edu>	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>	<20111027054147.GB29939@bx9.net>
	<CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>
Message-ID: <4EAB2AFE.7000901@ias.edu>


On 10/28/2011 04:56 PM, Peter St. John wrote:
> I think Greg is right on the money. Particularly at a place like IAS,
> where resources are good and users may be errant but are doing great
> things,

Have you been a visitor, member or staff member at IAS?

--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Fri Oct 28 19:16:44 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 28 Oct 2011 19:16:44 -0400
Subject: [Beowulf] Users abusing screen
In-Reply-To: <4EAB2AFE.7000901@ias.edu>
References: <4EA16F3A.8080209@ias.edu> <4EA505F4.7080007@science-computing.de>
	<4EA56C49.9060204@ias.edu> <20111025231305.GC9493@bx9.net>
	<4EA819DC.9090106@ias.edu>
	<774_1319662643_4EA87433_774_78911_1_alpine.LFD.2.02.1110261647170.7933@coffee.psychology.mcmaster.ca>
	<B5E49C9F-22BE-40F5-9519-39B8D6FC767D@ur.rochester.edu>
	<20111027054147.GB29939@bx9.net>
	<CAF4H3kdUEu54TokisdPYVmPdEUBdiWfEg-Mv4bgyAw7nBJjO_Q@mail.gmail.com>
	<4EAB2AFE.7000901@ias.edu>
Message-ID: <CAF4H3kfnqGkZHOCHj9kNCFhGVkSsv7P61q1ZH9MDSQrP3_nrPQ@mail.gmail.com>

Prentice,
No, I didin't mean to imply anything specific about e.g. your budget, but
IAS has a fantastic reputation.
Say hi to Dima for me, he plays Go and is an algebraic geometer visiting
this year.
Peter

On Fri, Oct 28, 2011 at 6:21 PM, Prentice Bisbal <prentice at ias.edu> wrote:

>
> On 10/28/2011 04:56 PM, Peter St. John wrote:
> > I think Greg is right on the money. Particularly at a place like IAS,
> > where resources are good and users may be errant but are doing great
> > things,
>
> Have you been a visitor, member or staff member at IAS?
>
> --
> Prentice
>  _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20111028/10101bef/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf