From reuti at staff.uni-marburg.de  Mon Aug  1 13:38:30 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 1 Aug 2011 19:38:30 +0200
Subject: [Beowulf] Fwd: H8DMR-82 ECC error
References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk>
Message-ID: <B7F9806C-78E8-46D6-A1C6-184FF8D32827@staff.uni-marburg.de>

Hi all,

on behalf of J?rg I forward this to the list, as his account seems to be blocked to post to this list any longer.

-- Reuti


> #############
> Dear all,
> 
> as I cannot post directly to the list although I am subscribing to it, I have 
> asked a friend of mine to post that for me.
> I am currently having severe problems with one of the clusters I am 
> maintaining. Around 50% of these nodes are crashing when we are running cp2k 
> on it. Although they are IB nodes, even without the IB card installed the test 
> jobs crash the node as well. So I can rule out an IB related problem. Memtest 
> was ok, I done 9 cycles without any problems. Unfortunately I cannot swap the 
> memory as I don't have any of them at all and hence I have to rely on Memtest 
> here. The nodes which are causing the problems show other symptoms as well: I 
> had problem with 3 of them to boot again after a normal shutdown procedure 
> (the fans come on, and die after a short period and I don't even get to the 
> POST stage at all). So they are offline as well. Two of the remaining nodes were 
> exceedingly hot after a reboot. When I took them out the fans were spinning 
> and now they appear to be ok. These are AMD Opteron 2220 dual core processors 
> with 2 CPUs per node. The mother board is a H8DMR-82 with the BIOS version 
> 080014 (release date 07/13/2007). It appears that almost always the same nodes 
> are crashing with this error message:
> 
> Hardware Error
> CPU0 Machine Check Exception  4 Bank 2 b200200000000863
> TSC 108dd369444
> Processor 2:40f13 Time 1311847912 Socket 0 APIC 0
> MC2-Status: Uncorredted error, report: yes MisV: invalid
> CPU context corrupt: yes UECC Error
> Bud Unit Error: prefetch/ECC error in data read from NB: local node originated 
> (SRC)
> Transaction type: prefetch (mem access), no timeout, cache level L3/generic. 
> Participating Processors: local node originated (SRC)
> 
> Judging from this I would guess there is a memory related problem.
> Given there are a number of people on the list here and they probably have 
> seen similar hardware before, do I simply have a bad batch of hardware which 
> is known to cause problems or do I have a different issue here? What I am after 
> is some kind of idea of where to look next. It is not the compiled program as 
> taking out the disc and placing it in a different node (same motherboard, same 
> Opteron but slightly different flags) does not cause any problems at all.
> Given the large number of nodes which causing problems, before I am proposing 
> to write off these nodes I would like to make sure it is not a subtle issue 
> like a BIOS upgrade which could cure the problem.
> 
> Many thanks for your help and all the best from London
> 
> J?rg
> 
> ##############
> 
> 
> 
> -- 
> *************************************************************
> J?rg Sa?mannshausen
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ 
> 
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
> 
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Wed Aug  3 00:28:10 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Wed, 03 Aug 2011 14:28:10 +1000
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
 -pre-alpha release
In-Reply-To: <207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com><26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2><BANLkTinLb2Di_5dFEfemqsZG5UyG4KytBQ@mail.gmail.com><4DA5E85D.4010801@ats.ucla.edu><BANLkTinMFKDr7t6oARV5vYxkgj1iq1gYKQ@mail.gmail.com><CAHwLALNaCj2toQrJPK_YCnhZmmUDKRGEZNmT9Uef=QUDO5CbKA@mail.gmail.com><Pine.LNX.4.64.1107112248160.8112@coffee.psychology.mcmaster.ca>	<CAHwLALMuQ3M6EGzRLeOwnfkE7gJXMm5Wd_qDiXuaFKBF5dCYqQ@mail.gmail.com>
	<207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4E38CE5A.5080506@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 13/07/11 18:47, Hearns, John wrote:

>> I think a lot of this will apply to non-SGE batch schedulers -- in
>> fact Torque will support hwloc in a future release.
>
> That sounds good to me!
> 
> (Hint - if anyone from Altair is listening in it would be useful...)

There's already been Carl Smith from pbspro.com on the hwloc
mailing list finding configure problems with AIX (which
have been fixed)...

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk44zloACgkQO2KABBYQAh8KUACfd5r45HcKBQdxRdRm3rb42fO1
VbgAoINM9lQ2rCIsa6G9Yv0b2qWii2aC
=F/Jm
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Mon Aug  8 21:45:38 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Tue, 09 Aug 2011 11:45:38 +1000
Subject: [Beowulf] IBM terminates Blue Waters contract
Message-ID: <4E409142.8060900@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

NCSA is now looking for a new hardware supplier..

http://www.ncsa.illinois.edu/BlueWaters/system.html

# Effective August 6, 2011, IBM terminated its contract
# with the University of Illinois to provide the supercomputer
# for the National Center for Supercomputing Applications'
# Blue Waters project.

More info at El Reg:

http://www.theregister.co.uk/2011/08/08/ibm_kills_blue_waters_super/

# To date, IBM had shipped three racks of the Blue Waters
# supers to NCSA, and these will be returned. IBM has to
# give back $30m to NCSA.

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5AkUIACgkQO2KABBYQAh8icQCeL9PM2FW6ZAMLKz9Wg55oePGY
/FcAoJQGuHMOTNZ0bNddHIAy40ZCe5oB
=fID2
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mdidomenico4 at gmail.com  Tue Aug  9 08:46:13 2011
From: mdidomenico4 at gmail.com (Michael Di Domenico)
Date: Tue, 9 Aug 2011 08:46:13 -0400
Subject: [Beowulf] Memory Testing?
Message-ID: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>

The last discussion on the list about faulty memory surronded using
some software like memtest or hpl to trigger SBE.

I'm curious if anyone has any experience with ECC uncorrectable errors
(specifically not the identification of), but which specific dimm in
the chassis it's pointing to.

The mcelog in linux doesn't seem to report the dimm slot correctly on
my supermicro boards.

The only way i know how to narrow it down is to pull all the dimms,
and then test one at a time, with the system.

I'm curious if there is a better way, or if anyone has any opinions on
the below (or another similar) piece of hardware that might do the
same

http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From a.travis at abdn.ac.uk  Tue Aug  9 08:54:14 2011
From: a.travis at abdn.ac.uk (Tony Travis)
Date: Tue, 09 Aug 2011 13:54:14 +0100
Subject: [Beowulf] Memory Testing?
In-Reply-To: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
Message-ID: <4E412DF6.1080204@abdn.ac.uk>

On 09/08/11 13:46, Michael Di Domenico wrote:
> [...]
> I'm curious if there is a better way, or if anyone has any opinions on
> the below (or another similar) piece of hardware that might do the
> same
>
> http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm

Hi, Michael.

We had a RAM tester back in the day, but memory that it passed still 
gave errors in the real systems we were using. I screen memory in the 
system it is installed in using Memtest86+ then run Charles Cazabon's 
user-mode "Memtester" on the running system to assess its reliability:

   http://pyropus.ca/software/memtester/

HTH,

   Tony.
-- 
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Thu Aug 11 08:04:58 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Thu, 11 Aug 2011 08:04:58 -0400 (EDT)
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E412DF6.1080204@abdn.ac.uk>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>
Message-ID: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>


Most of you are probably not aware of this story
about trade secrets and Bash scripts on HPC clusters
(I was not until a few months ago)

  http://www.clustermonkey.net//content/view/308/33/


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Thu Aug 11 10:05:00 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 07:05:00 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>

Interesting.. You wrote:
There is a general understanding that unless explicitly marked in the contents of the script (the text file that is the Bash program), a Bash script is freely available for use and modification by anyone. In some cases there is a copyright notice or a license that allows (or disallows) sharing or modification. These are always explicitly stated at the beginning of the script and obvious to anyone who reads or modifies the script. 

This is, of course, not correct under current law, marking is not required for copyright protection.  pretty much everything is born copyrighted.  Putting markings on it helps you claim for willful infringement (i.e. the recipient can't claim "I didn't know") which helps on the damages situation.  And, under the Berne convention, marking is required to assert your rights in some countries (All Rights Reserved is also required in some places)  Likewise, under current law, registration of copyright isn't required.  Registration allows you to collect statuatory damages for infringement, though.

For trade secrets, it's a bit trickier.  The recipient has to know that it's trade secret, but that can be done by marking on the delivery media, by a separate document, or even by verbal communication (here, this is proprietary, don't disclose it).  And you have to take some means to protect it: claiming something that is trade secret that is printed on bus stop  benches won't fly.  In any case, just because scripts aren't obfuscated doesn't mean they're not subject to trade secret protection.  If the owner of the secret takes some precautions to prevent wide disclosure (e.g. warning the recipient of its proprietary nature).  This is the aspect that will surely be the core of litigation:  would a "reasonable person" have known that the material was subject to trade secret protection.  As we all know, reasonable people differ, and the attorneys on both sides will trot out examples of marking and disclosure practices: good, bad, and indifferent.  As Doug noted, "special measur
 es" need to be taken, but there's no bright line standard for those measures, and, in practice, they can be pretty lax (and would be expected to be proportionate to the value of the secret.. the secret formula for Coke is probably more protected than the schedule for sweeping the floor in the manufacturing plant... both provide competitive advantage to Coke, but one is probably more important)

Something that a lot of tech people  in industry (particularly those coming from academia and working with open source) probably don't really fully understand is that pretty much everything you do for your employer is probably proprietary in some sense, and there is probably a written policy to that effect, which you, as an employee, are expected to be aware of. Or your supervisor told you, or the nice personnel person told you when you hired in 20 years ago, etc.  Mundane operational details of the business might be claimed to provide competitive advantage, especially if they're not "industry standard"  (humorously, if the employer has some really lame practice that's horrible, that might make it protectable.. then you could argue in court about whether it had any value). This is why there are "document review" departments and periodic training:  It helps reduce the problem of "inadvertent disclosure" and "I didn't know".  


This is the really tricky thing about trade secret: inadvertent disclosure can ruin the protection.  There have been cases of deliberately (and nefariously) "losing" trade secret info to spoil the protection.  And then, there is a somewhat notorious case of documents from Intel(?) that were in an envelope at a hotel desk or convention(?) with a person's name on it. Turns out there was a competitor (AMD?) with an employee of the same name, who accidentally got the documents handed to them (Hi, I'm John Smith, I think you have something for me.), opened the envelope, realized the problem, handed them right back, but in later action, it was alleged that this was sufficient to break the protection.  I don't recall all the details, and it probably settled out of court.  It's really complex.. "the bell, having been rung, cannot be unrung" (the phrase shows up in tons of legal writings), but in reality, if the inadvertent disclosure wasn't too big, etc.


Important things:
1) The language it's written in or obfuscation or not makes no difference.
2) the size of the work makes no difference.  "Candy/Is dandy/But liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair use, and anyway, the copyright may have expired)
3) the intellectual effort in the work makes no difference (unlike patents, there's no requirement of novelty) (unless you're trying to claim trade secret protection on something that's already public knowledge.. the thing might be public, but the fact that you selected that particular one might be trade secret.)


Jim

I am not a lawyer, but I spent all too many (hundreds) of hours in depositions and meetings and court where one of the main issues was the "was there adequate notice of the trade secret status of the information" as well as "did they steal it", not to mention the always popular "can you describe the secret with specificity and particularity".  If the bad guy steals the trade secret and then keeps it secret, it's fairly hard to show that they actually have it.  There are also folks who have developed techniques to evade the restrictions of an NDA ("Sure, I signed it, but that exceeded the scope of my corporate authority, so it's invalid. "  "Technically, I wasn't an employee that afternoon, even though I was in the morning, and I was the next week, but hey, for that afternoon, I wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry about giving you that business card with the company name on it, but it was what I happened to have in my wallet")


________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Douglas Eadline [deadline at eadline.org]
Sent: Thursday, August 11, 2011 05:04
To: beowulf at beowulf.org
Subject: [Beowulf] All Your BASH Are Belong To Us

Most of you are probably not aware of this story
about trade secrets and Bash scripts on HPC clusters
(I was not until a few months ago)

  http://www.clustermonkey.net//content/view/308/33/


--
Doug

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Thu Aug 11 10:35:01 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Thu, 11 Aug 2011 21:35:01 +0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
Message-ID: <4E43E895.6070803@pathscale.com>

  On 08/11/11 07:04 PM, Douglas Eadline wrote:
>
> Most of you are probably not aware of this story
> about trade secrets and Bash scripts on HPC clusters
> (I was not until a few months ago)
>
>    http://www.clustermonkey.net//content/view/308/33/
IANAL and this shouldn't be taken as legal advice -

Bret Stouder if you haven't done so already contact SFLC immediately.  
They provide legal services to open source projects and may be able to 
help.  (I can help put you in touch with them or other very good open 
source legal council.)


./C


/* Armchair lawyers are generally not helpful and in many cases it's 
counterproductive for them to express their own personal views.  I hope 
this discussion dies immediately without further comment */
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Thu Aug 11 12:58:47 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Thu, 11 Aug 2011 12:58:47 -0400 (EDT)
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.J
	PL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>


I had a chance to read some of the depositions, really interesting
and even embarrassing stuff. My guess is Atipa got angry when
Bret and the other employees left to form a new company. They
may have searched for ways to stop them and decided
to go after them for what Atipa considered "trade secrets."
A more or less traditional method to prevent ex-employees from
stealing your secret sauce (as you explain below).

The only problem was much of the "secrets" were developed
and shared in an open environment. This may have been a
surprise to those in charge and makes their claims
a bit harder to swallow. (i.e. a fundamental misunderstanding
of how trade secrets can be protected in an open source ecosystem).
And, what I try to point out in the article, is that this
open source ecosystem is what allowed hardware vendors to
sell clusters in the first place.

There is of course more to this case than I describe in the article.
I'll post more as it progresses.

--
Doug

> Interesting.. You wrote:
> There is a general understanding that unless explicitly marked in the
> contents of the script (the text file that is the Bash program), a Bash
> script is freely available for use and modification by anyone. In some
> cases there is a copyright notice or a license that allows (or disallows)
> sharing or modification. These are always explicitly stated at the
> beginning of the script and obvious to anyone who reads or modifies the
> script.
>
> This is, of course, not correct under current law, marking is not required
> for copyright protection.  pretty much everything is born copyrighted.
> Putting markings on it helps you claim for willful infringement (i.e. the
> recipient can't claim "I didn't know") which helps on the damages
> situation.  And, under the Berne convention, marking is required to assert
> your rights in some countries (All Rights Reserved is also required in
> some places)  Likewise, under current law, registration of copyright isn't
> required.  Registration allows you to collect statuatory damages for
> infringement, though.
>
> For trade secrets, it's a bit trickier.  The recipient has to know that
> it's trade secret, but that can be done by marking on the delivery media,
> by a separate document, or even by verbal communication (here, this is
> proprietary, don't disclose it).  And you have to take some means to
> protect it: claiming something that is trade secret that is printed on bus
> stop  benches won't fly.  In any case, just because scripts aren't
> obfuscated doesn't mean they're not subject to trade secret protection.
> If the owner of the secret takes some precautions to prevent wide
> disclosure (e.g. warning the recipient of its proprietary nature).  This
> is the aspect that will surely be the core of litigation:  would a
> "reasonable person" have known that the material was subject to trade
> secret protection.  As we all know, reasonable people differ, and the
> attorneys on both sides will trot out examples of marking and disclosure
> practices: good, bad, and indifferent.  As Doug noted, "special measures"
> need to be taken, but there's no bright line standard for those measures,
> and, in practice, they can be pretty lax (and would be expected to be
> proportionate to the value of the secret.. the secret formula for Coke is
> probably more protected than the schedule for sweeping the floor in the
> manufacturing plant... both provide competitive advantage to Coke, but one
> is probably more important)
>
> Something that a lot of tech people  in industry (particularly those
> coming from academia and working with open source) probably don't really
> fully understand is that pretty much everything you do for your employer
> is probably proprietary in some sense, and there is probably a written
> policy to that effect, which you, as an employee, are expected to be aware
> of. Or your supervisor told you, or the nice personnel person told you
> when you hired in 20 years ago, etc.  Mundane operational details of the
> business might be claimed to provide competitive advantage, especially if
> they're not "industry standard"  (humorously, if the employer has some
> really lame practice that's horrible, that might make it protectable..
> then you could argue in court about whether it had any value). This is why
> there are "document review" departments and periodic training:  It helps
> reduce the problem of "inadvertent disclosure" and "I didn't know".
>
>
> This is the really tricky thing about trade secret: inadvertent disclosure
> can ruin the protection.  There have been cases of deliberately (and
> nefariously) "losing" trade secret info to spoil the protection.  And
> then, there is a somewhat notorious case of documents from Intel(?) that
> were in an envelope at a hotel desk or convention(?) with a person's name
> on it. Turns out there was a competitor (AMD?) with an employee of the
> same name, who accidentally got the documents handed to them (Hi, I'm John
> Smith, I think you have something for me.), opened the envelope, realized
> the problem, handed them right back, but in later action, it was alleged
> that this was sufficient to break the protection.  I don't recall all the
> details, and it probably settled out of court.  It's really complex.. "the
> bell, having been rung, cannot be unrung" (the phrase shows up in tons of
> legal writings), but in reality, if the inadvertent disclosure wasn't too
> big, etc.
>
>
> Important things:
> 1) The language it's written in or obfuscation or not makes no difference.
> 2) the size of the work makes no difference.  "Candy/Is dandy/But
> liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair
> use, and anyway, the copyright may have expired)
> 3) the intellectual effort in the work makes no difference (unlike
> patents, there's no requirement of novelty) (unless you're trying to claim
> trade secret protection on something that's already public knowledge.. the
> thing might be public, but the fact that you selected that particular one
> might be trade secret.)
>
>
> Jim
>
> I am not a lawyer, but I spent all too many (hundreds) of hours in
> depositions and meetings and court where one of the main issues was the
> "was there adequate notice of the trade secret status of the information"
> as well as "did they steal it", not to mention the always popular "can you
> describe the secret with specificity and particularity".  If the bad guy
> steals the trade secret and then keeps it secret, it's fairly hard to show
> that they actually have it.  There are also folks who have developed
> techniques to evade the restrictions of an NDA ("Sure, I signed it, but
> that exceeded the scope of my corporate authority, so it's invalid. "
> "Technically, I wasn't an employee that afternoon, even though I was in
> the morning, and I was the next week, but hey, for that afternoon, I
> wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry
> about giving you that business card with the company name on it, but it
> was what I happened to have in my wallet")
>
>
>
> ________________________________________
> From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf
> Of Douglas Eadline [deadline at eadline.org]
> Sent: Thursday, August 11, 2011 05:04
> To: beowulf at beowulf.org
> Subject: [Beowulf] All Your BASH Are Belong To Us
>
> Most of you are probably not aware of this story
> about trade secrets and Bash scripts on HPC clusters
> (I was not until a few months ago)
>
>   http://www.clustermonkey.net//content/view/308/33/
>
>
> --
> Doug
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Thu Aug 11 13:40:35 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 10:40:35 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>

> -----Original Message-----
> From: Douglas Eadline [mailto:deadline at eadline.org]
> Sent: Thursday, August 11, 2011 9:59 AM
> To: Lux, Jim (337C)
> Cc: beowulf at beowulf.org
> Subject: RE: [Beowulf] All Your BASH Are Belong To Us
> 
> 
> I had a chance to read some of the depositions, really interesting
> and even embarrassing stuff. My guess is Atipa got angry when
> Bret and the other employees left to form a new company. They
> may have searched for ways to stop them and decided
> to go after them for what Atipa considered "trade secrets."
> A more or less traditional method to prevent ex-employees from
> stealing your secret sauce (as you explain below).
> 
> The only problem was much of the "secrets" were developed
> and shared in an open environment. This may have been a
> surprise to those in charge and makes their claims
> a bit harder to swallow. (i.e. a fundamental misunderstanding
> of how trade secrets can be protected in an open source ecosystem).
> And, what I try to point out in the article, is that this
> open source ecosystem is what allowed hardware vendors to
> sell clusters in the first place.
> 
> There is of course more to this case than I describe in the article.
> I'll post more as it progresses.
> 
> --


Yes.. and a standard way to attempt to do a "non-compete" (which are typically illegal in California) is for the former employer to threaten the new employer (or customers of the spin-off) with the "theft of trade secrets" allegation.  Even if the allegation is unfounded, you have to spend time and money dealing with it (if you're the ex-employee) or it creates sufficient fear, uncertainty, and doubt (on the part of the customers of the ex-employee spin off).

I'm also not so na?ve as to think that employees don't actually take trade secrets with them and use them, so it's not entirely improbable.

But, in a perfect world, there would be substantial sanctions for doing this kind of thing as a competitive maneuver.


Legal niceties aside, Doug brings up an interesting point about "trade secrets" or intellectual property in general...

You work at a job and become experienced and knowledgeable in a particular line of business.  How much of that is "general knowledge" (not protectable) and how much is "peculiar to the employer" (protectable)?  This is a pretty fuzzy thing.

A for instance.. say you leaned over to the next cube and asked someone for help formulating a particularly complex command line to grep a file.  The exact, character for character version of that command line probably belongs to the employer, but what about the knowledge you now have of how to do those kinds of searches?  What if your coworker had actually done the command line (in its exact form) at some other place and brought it with them?

Then, there's the practical details of getting approval from a (conservative) power-that-is.  Sure, you might have gotten it from open source, but will your corporate reviewer agree? Or, will they use the default "it's all proprietary unless proven otherwise, and we don't have time to look at your proof, and you don't have time  to be gathering the proof".

It's really depends on a corporate/organizational commitment to open source to institute processes to keep all this stuff straight.  (and we won't even get into "open source" vs "able to redistribute")
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Thu Aug 11 13:53:55 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Aug 2011 13:53:55 -0400
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E441733.80701@scalableinformatics.com>

On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote:

> It's really depends on a corporate/organizational commitment to open
> source to institute processes to keep all this stuff straight.  (and
> we won't even get into "open source" vs "able to redistribute")

There are profoundly incorrect views running around out there, as to 
what "open source" means.  I had someone tell me that GPLv2 prevented 
distribution of binaries (it doesn't).  I've watched people slap 
additional legal bits in conflict with GPL onto GPL source.

I don't want to say "its a mess" but I do want to say that "there is a 
profound need for a very simple statement of what is and isn't allowed 
by each license."  Including what is involved in altering licensing.

While these are more or less amusing and some won't really result in 
court cases and precedents, there is at least one effort that has some 
nice potential to test GPL.  See the zfs on linux systems. c.f. 
http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue

I can't imagine this will end well for any company shipping this, in 
source, build script, or binary form.  CDDL aside, Oracle's got some IP 
claims they could file, as well as other things.  I can't believe that 
shipping NetBSD binaries with Oracle IP inside would end well either.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Thu Aug 11 14:19:00 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 11:19:00 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E441733.80701@scalableinformatics.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
	<4E441733.80701@scalableinformatics.com>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>

 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman
> Sent: Thursday, August 11, 2011 10:54 AM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] All Your BASH Are Belong To Us
> 
> On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote:
> 
> > It's really depends on a corporate/organizational commitment to open
> > source to institute processes to keep all this stuff straight.  (and
> > we won't even get into "open source" vs "able to redistribute")
> 
> There are profoundly incorrect views running around out there, as to
> what "open source" means.  I had someone tell me that GPLv2 prevented
> distribution of binaries (it doesn't).  I've watched people slap
> additional legal bits in conflict with GPL onto GPL source.
> 
> I don't want to say "its a mess" but I do want to say that "there is a
> profound need for a very simple statement of what is and isn't allowed
> by each license."  Including what is involved in altering licensing.
> 

Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses.  They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems.  It's been a royal pain for me trying to release stuff to the general public in a useful form.  It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run.  But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements.  What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it.

The complication is that in general, work funded by NASA and performed by government employees is a "government work not subject to copyright" although work funded by NASA and performed by an educational institution (e.g. JPL, which is part of Cal Tech) is subject to Bayh-Dole, and is presumed to be owned by the educational institution, with a fully paid, non-exclusive license granted to the government for government purposes.  (there is, of course, litigation about what those "government purposes" might happen to be).

The incompatibility arises because NASA is legally obligated to distribute their products with no downstream restrictions on use, which is not the same as, for instance, GPL, which imposes restrictions on downstream use.   NASA (and the government in general) doesn't care if someone takes their product and uses it to make a subsequent closed source product which is totally proprietary. (and in fact, NASTRAN would be a fine example of this)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Aug 11 15:28:28 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 11 Aug 2011 15:28:28 -0400
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E442D5C.1030902@ldeo.columbia.edu>

Lux, Jim (337C) wrote:
>> -----Original Message-----
>> From: Douglas Eadline [mailto:deadline at eadline.org]
>> Sent: Thursday, August 11, 2011 9:59 AM
>> To: Lux, Jim (337C)
>> Cc: beowulf at beowulf.org
>> Subject: RE: [Beowulf] All Your BASH Are Belong To Us
>>
>>
>> I had a chance to read some of the depositions, really interesting
>> and even embarrassing stuff. My guess is Atipa got angry when
>> Bret and the other employees left to form a new company. They
>> may have searched for ways to stop them and decided
>> to go after them for what Atipa considered "trade secrets."
>> A more or less traditional method to prevent ex-employees from
>> stealing your secret sauce (as you explain below).
>>
>> The only problem was much of the "secrets" were developed
>> and shared in an open environment. This may have been a
>> surprise to those in charge and makes their claims
>> a bit harder to swallow. (i.e. a fundamental misunderstanding
>> of how trade secrets can be protected in an open source ecosystem).
>> And, what I try to point out in the article, is that this
>> open source ecosystem is what allowed hardware vendors to
>> sell clusters in the first place.
>>
>> There is of course more to this case than I describe in the article.
>> I'll post more as it progresses.
>>
>> --
> 
> 
> Yes.. and a standard way to attempt to do a "non-compete" 
> (which are typically illegal in California) is for the former employer 
> to threaten the new employer (or customers of the spin-off) with the 
> "theft of trade secrets" allegation.  Even if the allegation is unfounded, 
> you have to spend time and money dealing with it 
> (if you're the ex-employee) or it creates sufficient fear,
> uncertainty, and doubt (on the part of the customers
> of the ex-employee spin off).
> 

Very true, and in the arena of intimidating former employees and
their current employers/competitors, there is nothing special
about the privatization of shell scripts or of nifty
regular expressions to grep files.

Recent examples include fields perhaps more lucrative than HPC,
such as English muffins
(Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella):

http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm

and high frequency trading (isn't it HPC also?) (Goldman Sachs vs. 
Sergey Aleynikov):

http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html

What is interesting is that across the board
the thing that free entrepreneurs seem
to hate the most is their competitors free entrepreneurship.

> I'm also not so na?ve as to think that employees don't actually take 
> trade secrets with them and use them, so it's not entirely improbable.
> 
> But, in a perfect world, there would be substantial sanctions for 
doing this kind of thing as a competitive maneuver.
> 
> 
> Legal niceties aside, Doug brings up an interesting point about 
"trade secrets" or intellectual property in general...
> 
> You work at a job and become experienced and knowledgeable in a 
> particular line of business.  How much of that is "general knowledge" 
> (not protectable) and how much is "peculiar to the employer" (protectable)?  This is a pretty fuzzy thing.
> 
> A for instance.. say you leaned over to the next cube and asked 
> someone for help formulating a particularly complex command line to 
> grep a file.  The exact, character for character version of that 
> command line probably belongs to the employer, but what about the 
> knowledge you now have of how to do those kinds of searches?  
> What if your coworker had actually done the command line 
> (in its exact form) at some other place and brought it with them?
> 
> Then, there's the practical details of getting approval from a 
> (conservative) power-that-is.  Sure, you might have gotten it from 
> open source, but will your corporate reviewer agree? Or, will they 
> use the default "it's all proprietary unless proven otherwise, and 
> we don't have time to look at your proof, and you don't have time  
> to be gathering the proof".
> 
> It's really depends on a corporate/organizational commitment to 
> open source to institute processes to keep all this stuff straight.  
> (and we won't even get into "open source" vs "able to redistribute")
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Thu Aug 11 15:57:43 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Aug 2011 15:57:43 -0400
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E442D5C.1030902@ldeo.columbia.edu>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
	<4E442D5C.1030902@ldeo.columbia.edu>
Message-ID: <4E443437.9070705@scalableinformatics.com>

On 08/11/2011 03:28 PM, Gus Correa wrote:

> Recent examples include fields perhaps more lucrative than HPC,
> such as English muffins
> (Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella):
>
> http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm

That muffin just got real ...

>
> and high frequency trading (isn't it HPC also?) (Goldman Sachs vs.
> Sergey Aleynikov):
>
> http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html
>
> What is interesting is that across the board
> the thing that free entrepreneurs seem
> to hate the most is their competitors free entrepreneurship.

I am running into an internal parser error in attempting to understand 
this last sentence.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Thu Aug 11 19:56:02 2011
From: mathog at caltech.edu (David Mathog)
Date: Thu, 11 Aug 2011 16:56:02 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>

Since this is very OT, I'll try to keep it short.

Here is the problem - imagine a group of people who neither know nor
trust each other, yet must agree on the fairness of a single random
number.  Basically they are going to have a lottery.  They aren't
organized enough to generate such a number themselves - it must be found
from some process already active on the web, and be so obviously "fair"
that they won't argue about that.  Everybody must be able to obtain it
freely from a web connection.  

Can any of you think of a source on the web for a set of small files
with these properties:

1.  from a trusted source (here this mostly means the data is generated
    for some other innocuous purpose)
2.  represents a largely random process (temperature readings,
    stock market values, etc.) with a set generated at known intervals,
    preferably daily (at least M-F)
3.  are never, ever, revised
4.  are distributed reliably (for instance, signed files)
5.  are publicly and freely available
6.  can be obtained reliably (is available from many sites)

So far I have looked at stock market values and weather data - without
much luck.

You would think the S&P 500 is the S&P 500 and one could look it up on
any site and get the same data.  Not so! Check the Yahoo and Google
financial sites for the first few weeks of Jan. 2011 and you will find
digits that differ between the two sites in every single column.  Not
every day mind you, but often enough that it isn't reliable.  Heck, the
volume numbers differ by large factors between the two sites.  So just
choose one site and go with that?  Not so fast - if the single source
goes down the data is unavailable, and there is no guarantee that the
site (which is not party to this particular use of their data) might not
revise the page or choose to block it entirely.

Or weather data, right?  Lots of random bits there and we trust NOAA. 
But good luck with criteria 3-6.  In particular, they don't give data
out for free.  In theory no US Government site should, since they are
supposed to charge to recover distribution costs.

Criteria 4-6 are typical of software distributed on mirror sites, but so
far I have not found any physical measurements which are distributed in
a similar manner.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mdidomenico4 at gmail.com  Thu Aug 11 20:28:11 2011
From: mdidomenico4 at gmail.com (Michael Di Domenico)
Date: Thu, 11 Aug 2011 20:28:11 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
Message-ID: <CABOsP2PEffq=40LbJohMcFHbeQxHQNtLuqfKSd_7MGYeX1svqw@mail.gmail.com>

How many random numbers per day are you expecting?
If everyone checks at exactly 1pm, should they all see the same
"random" number or should they each get their own "random" number?
What kind of entropy are you expecting on "random"?

On Thu, Aug 11, 2011 at 7:56 PM, David Mathog <mathog at caltech.edu> wrote:
> Since this is very OT, I'll try to keep it short.
>
> Here is the problem - imagine a group of people who neither know nor
> trust each other, yet must agree on the fairness of a single random
> number. ?Basically they are going to have a lottery. ?They aren't
> organized enough to generate such a number themselves - it must be found
> from some process already active on the web, and be so obviously "fair"
> that they won't argue about that. ?Everybody must be able to obtain it
> freely from a web connection.
>
> Can any of you think of a source on the web for a set of small files
> with these properties:
>
> 1. ?from a trusted source (here this mostly means the data is generated
> ? ?for some other innocuous purpose)
> 2. ?represents a largely random process (temperature readings,
> ? ?stock market values, etc.) with a set generated at known intervals,
> ? ?preferably daily (at least M-F)
> 3. ?are never, ever, revised
> 4. ?are distributed reliably (for instance, signed files)
> 5. ?are publicly and freely available
> 6. ?can be obtained reliably (is available from many sites)
>
> So far I have looked at stock market values and weather data - without
> much luck.
>
> You would think the S&P 500 is the S&P 500 and one could look it up on
> any site and get the same data. ?Not so! Check the Yahoo and Google
> financial sites for the first few weeks of Jan. 2011 and you will find
> digits that differ between the two sites in every single column. ?Not
> every day mind you, but often enough that it isn't reliable. ?Heck, the
> volume numbers differ by large factors between the two sites. ?So just
> choose one site and go with that? ?Not so fast - if the single source
> goes down the data is unavailable, and there is no guarantee that the
> site (which is not party to this particular use of their data) might not
> revise the page or choose to block it entirely.
>
> Or weather data, right? ?Lots of random bits there and we trust NOAA.
> But good luck with criteria 3-6. ?In particular, they don't give data
> out for free. ?In theory no US Government site should, since they are
> supposed to charge to recover distribution costs.
>
> Criteria 4-6 are typical of software distributed on mirror sites, but so
> far I have not found any physical measurements which are distributed in
> a similar manner.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Thu Aug 11 20:44:15 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Thu, 11 Aug 2011 20:44:15 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
Message-ID: <CAF4H3kcd7ZRcA4fzuNWC3bCLqNc7+uaBLHZ1MywACv+0SVVtXQ@mail.gmail.com>

David,
I was thinking the National Weather Service, instead of NOAA; it's a vital
public service that such information is recorded and diseminated for
airfields and the like, e.g.:
http://www.weather.gov/climate/getclimate.php?wfo=bou
So I would write a script to scrape least significant digits from that, for
agreed times, dates, and locations. Whoever writes the script and wherever
it is run, anyone can check its results manually.
However, that item has a disclaimer that the data is subject to review :) So
it may matter how far back in time you need to be able to go, and how long
into the future you need the data to be available at the same place. But
nobody promises their website will stay unchanged indefinitely, they can't.
But at any given time, a group can agree on (say) the lowest significant
digits of the temperatures at time T in cities X, Y, and Z as reported at
time T2 by the NWS.
Peter

On Thu, Aug 11, 2011 at 7:56 PM, David Mathog <mathog at caltech.edu> wrote:

> Since this is very OT, I'll try to keep it short.
>
> Here is the problem - imagine a group of people who neither know nor
> trust each other, yet must agree on the fairness of a single random
> number.  Basically they are going to have a lottery.  They aren't
> organized enough to generate such a number themselves - it must be found
> from some process already active on the web, and be so obviously "fair"
> that they won't argue about that.  Everybody must be able to obtain it
> freely from a web connection.
>
> Can any of you think of a source on the web for a set of small files
> with these properties:
>
> 1.  from a trusted source (here this mostly means the data is generated
>    for some other innocuous purpose)
> 2.  represents a largely random process (temperature readings,
>    stock market values, etc.) with a set generated at known intervals,
>    preferably daily (at least M-F)
> 3.  are never, ever, revised
> 4.  are distributed reliably (for instance, signed files)
> 5.  are publicly and freely available
> 6.  can be obtained reliably (is available from many sites)
>
> So far I have looked at stock market values and weather data - without
> much luck.
>
> You would think the S&P 500 is the S&P 500 and one could look it up on
> any site and get the same data.  Not so! Check the Yahoo and Google
> financial sites for the first few weeks of Jan. 2011 and you will find
> digits that differ between the two sites in every single column.  Not
> every day mind you, but often enough that it isn't reliable.  Heck, the
> volume numbers differ by large factors between the two sites.  So just
> choose one site and go with that?  Not so fast - if the single source
> goes down the data is unavailable, and there is no guarantee that the
> site (which is not party to this particular use of their data) might not
> revise the page or choose to block it entirely.
>
> Or weather data, right?  Lots of random bits there and we trust NOAA.
> But good luck with criteria 3-6.  In particular, they don't give data
> out for free.  In theory no US Government site should, since they are
> supposed to charge to recover distribution costs.
>
> Criteria 4-6 are typical of software distributed on mirror sites, but so
> far I have not found any physical measurements which are distributed in
> a similar manner.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110811/e23e2d56/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From james.p.lux at jpl.nasa.gov  Thu Aug 11 20:55:30 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 17:55:30 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <CAF4H3kcd7ZRcA4fzuNWC3bCLqNc7+uaBLHZ1MywACv+0SVVtXQ@mail.gmail.com>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
	<CAF4H3kcd7ZRcA4fzuNWC3bCLqNc7+uaBLHZ1MywACv+0SVVtXQ@mail.gmail.com>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108494CD217@ALTPHYEMBEVSP20.RES.AD.JPL>

Low order digits from weather stations are not likely to be random.
They're almost certainly converted from some quantized converter, and may actually have a double conversion (Celsius Fahrenheit)

NWS and NOAA are actually part of the same organization, aren't they.  (since the NWS web page at weather.gov is titled "NOAA's National Weather Service")

Jim Lux
+1(818)354-2075
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Peter St. John
Sent: Thursday, August 11, 2011 5:44 PM
To: David Mathog
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] OT: public random numbers?

David,
I was thinking the National Weather Service, instead of NOAA; it's a vital public service that such information is recorded and diseminated for airfields and the like, e.g.:
http://www.weather.gov/climate/getclimate.php?wfo=bou
So I would write a script to scrape least significant digits from that, for agreed times, dates, and locations. Whoever writes the script and wherever it is run, anyone can check its results manually.
However, that item has a disclaimer that the data is subject to review :) So it may matter how far back in time you need to be able to go, and how long into the future you need the data to be available at the same place. But nobody promises their website will stay unchanged indefinitely, they can't. But at any given time, a group can agree on (say) the lowest significant digits of the temperatures at time T in cities X, Y, and Z as reported at time T2 by the NWS.
Peter

On Thu, Aug 11, 2011 at 7:56 PM, David Mathog <mathog at caltech.edu<mailto:mathog at caltech.edu>> wrote:
Since this is very OT, I'll try to keep it short.

Here is the problem - imagine a group of people who neither know nor
trust each other, yet must agree on the fairness of a single random
number.  Basically they are going to have a lottery.  They aren't
organized enough to generate such a number themselves - it must be found
from some process already active on the web, and be so obviously "fair"
that they won't argue about that.  Everybody must be able to obtain it
freely from a web connection.

Can any of you think of a source on the web for a set of small files
with these properties:

1.  from a trusted source (here this mostly means the data is generated
   for some other innocuous purpose)
2.  represents a largely random process (temperature readings,
   stock market values, etc.) with a set generated at known intervals,
   preferably daily (at least M-F)
3.  are never, ever, revised
4.  are distributed reliably (for instance, signed files)
5.  are publicly and freely available
6.  can be obtained reliably (is available from many sites)

So far I have looked at stock market values and weather data - without
much luck.

You would think the S&P 500 is the S&P 500 and one could look it up on
any site and get the same data.  Not so! Check the Yahoo and Google
financial sites for the first few weeks of Jan. 2011 and you will find
digits that differ between the two sites in every single column.  Not
every day mind you, but often enough that it isn't reliable.  Heck, the
volume numbers differ by large factors between the two sites.  So just
choose one site and go with that?  Not so fast - if the single source
goes down the data is unavailable, and there is no guarantee that the
site (which is not party to this particular use of their data) might not
revise the page or choose to block it entirely.

Or weather data, right?  Lots of random bits there and we trust NOAA.
But good luck with criteria 3-6.  In particular, they don't give data
out for free.  In theory no US Government site should, since they are
supposed to charge to recover distribution costs.

Criteria 4-6 are typical of software distributed on mirror sites, but so
far I have not found any physical measurements which are distributed in
a similar manner.

Thanks,

David Mathog
mathog at caltech.edu<mailto:mathog at caltech.edu>
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110811/530523ff/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From samuel at unimelb.edu.au  Thu Aug 11 21:54:11 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Fri, 12 Aug 2011 11:54:11 +1000
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
Message-ID: <4E4487C3.60605@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/08/11 22:04, Douglas Eadline wrote:

> Most of you are probably not aware of this story
> about trade secrets and Bash scripts on HPC clusters

On the copyright side of things (not the trade secret stuff),
my understanding (IANAL, etc) is that anything you create you[0]
hold copyright on[1], and for someone else to copy it they must
have some agreement (license) to be able to do so.

Thus a shell script with no license attached or embedded
is copyrighted and you should get explicit permission to
use it..

cheers,
Chris

[0] - where "you" is the entity that is the copyright holder,
      not necessarily the creator.

[1] - yes, I know there are some entities that aren't allowed
      to hold copyright.. :-)
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5Eh8MACgkQO2KABBYQAh8sOgCePl6n4UTNZGMAePc8Kb+kmK4a
DHwAoJeVgYKUMDpJe78/2mQqbL2ryJ4M
=UAan
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Thu Aug 11 23:22:34 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Fri, 12 Aug 2011 10:22:34 +0700
Subject: [Beowulf] Open source @NASA - WAS: OT
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>	<4E441733.80701@scalableinformatics.com>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E449C7A.1030102@pathscale.com>

  On 08/12/11 01:19 AM, Lux, Jim (337C) wrote:
>
> Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses.  They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems.  It's been a royal pain for me trying to release stuff to the general public in a useful form.  It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run.  But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements.  What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it.
Hi Jim,

For this exact problem you've described an ebuild could be a very good 
solution.  (I've personally abandoned gentoo a long time ago)

By solution I mean bash script that explicitly checks the hashes, 
resolves the deps and pulls the source to build everything from the 
eleventy-seven URLs and FTP sites.

The people working with gentoo-science would likely appreciate it a 
lot.  (The learning curve is fairly low if you know bash already)
--------
With regards to open source license proliferation and 
incompatibilities.  I think most people in the community are working 
towards streamlining, but changes after-the-fact can be 
difficult/impossible.  I'm empathetic to your situation and I'd say work 
towards getting your projects merged with something like gentoo to start 
and then maybe something like OpenSuSE build service.  This would cover 
a very large % of the packaging/distribution problem and get it in the 
hands of users easily.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Thu Aug 11 23:51:12 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Fri, 12 Aug 2011 13:51:12 +1000
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>	<4E441733.80701@scalableinformatics.com>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E44A330.7090503@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/08/11 04:19, Lux, Jim (337C) wrote:

> The incompatibility arises because NASA is legally
> obligated to distribute their products with no
> downstream restrictions on use,

Actually no - the NASA license is incompatible with
the GPL (at least) because:

http://www.gnu.org/licenses/license-list.html

# The NASA Open Source Agreement, version 1.3, is not
# a free software license because it includes a provision
# requiring changes to be your ?original creation?. Free
# software development depends on combining code from
# third parties, and the NASA license doesn't permit this.

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9
+iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT
=GL6d
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Thu Aug 11 23:57:03 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 20:57:03 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E44A330.7090503@unimelb.edu.au>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
	<4E441733.80701@scalableinformatics.com>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>,
	<4E44A330.7090503@unimelb.edu.au>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D69@ALTPHYEMBEVSP20.RES.AD.JPL>

Yes, that too...

________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Christopher Samuel [samuel at unimelb.edu.au]
Sent: Thursday, August 11, 2011 20:51
To: beowulf at beowulf.org
Subject: Re: [Beowulf] All Your BASH Are Belong To Us

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/08/11 04:19, Lux, Jim (337C) wrote:

> The incompatibility arises because NASA is legally
> obligated to distribute their products with no
> downstream restrictions on use,

Actually no - the NASA license is incompatible with
the GPL (at least) because:

http://www.gnu.org/licenses/license-list.html

# The NASA Open Source Agreement, version 1.3, is not
# a free software license because it includes a provision
# requiring changes to be your ?original creation?. Free
# software development depends on combining code from
# third parties, and the NASA license doesn't permit this.

cheers,
Chris
- --
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9
+iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT
=GL6d
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 00:31:30 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 00:31:30 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
Message-ID: <alpine.LFD.2.02.1108120030370.5818@lilith>

On Thu, 11 Aug 2011, David Mathog wrote:

> Since this is very OT, I'll try to keep it short.
>
> Here is the problem - imagine a group of people who neither know nor
> trust each other, yet must agree on the fairness of a single random
> number.  Basically they are going to have a lottery.  They aren't
> organized enough to generate such a number themselves - it must be found
> from some process already active on the web, and be so obviously "fair"
> that they won't argue about that.  Everybody must be able to obtain it
> freely from a web connection.

   http://www.random.org/

sincerely,

    rgb

>
> Can any of you think of a source on the web for a set of small files
> with these properties:
>
> 1.  from a trusted source (here this mostly means the data is generated
>    for some other innocuous purpose)
> 2.  represents a largely random process (temperature readings,
>    stock market values, etc.) with a set generated at known intervals,
>    preferably daily (at least M-F)
> 3.  are never, ever, revised
> 4.  are distributed reliably (for instance, signed files)
> 5.  are publicly and freely available
> 6.  can be obtained reliably (is available from many sites)
>
> So far I have looked at stock market values and weather data - without
> much luck.
>
> You would think the S&P 500 is the S&P 500 and one could look it up on
> any site and get the same data.  Not so! Check the Yahoo and Google
> financial sites for the first few weeks of Jan. 2011 and you will find
> digits that differ between the two sites in every single column.  Not
> every day mind you, but often enough that it isn't reliable.  Heck, the
> volume numbers differ by large factors between the two sites.  So just
> choose one site and go with that?  Not so fast - if the single source
> goes down the data is unavailable, and there is no guarantee that the
> site (which is not party to this particular use of their data) might not
> revise the page or choose to block it entirely.
>
> Or weather data, right?  Lots of random bits there and we trust NOAA.
> But good luck with criteria 3-6.  In particular, they don't give data
> out for free.  In theory no US Government site should, since they are
> supposed to charge to recover distribution costs.
>
> Criteria 4-6 are typical of software distributed on mirror sites, but so
> far I have not found any physical measurements which are distributed in
> a similar manner.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 11:21:37 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 08:21:37 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>

Robert G. Brown wrote:

>  Everybody must be able to obtain it
> > freely from a web connection.
> 
>    http://www.random.org/
> 

Nice site.  They have something that is very close, the pregenerated
random files, from which a small set of digits may be extracted, and the
files themselves have MD5 checksums (but are not signed).
They also support https.  It comes up a little short on criteria 1 (we
really don't know what is going on behind the scenes) and 6 (it is a
single site.)

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Fri Aug 12 11:26:05 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 12 Aug 2011 11:26:05 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
Message-ID: <4E45460D.9040505@scalableinformatics.com>

On 08/12/2011 11:21 AM, David Mathog wrote:
> Robert G. Brown wrote:
>
>>   Everybody must be able to obtain it
>>> freely from a web connection.
>>
>>     http://www.random.org/

And from SGI days ... http://www.lavarnd.org/

> Nice site.  They have something that is very close, the pregenerated
> random files, from which a small set of digits may be extracted, and the
> files themselves have MD5 checksums (but are not signed).
> They also support https.  It comes up a little short on criteria 1 (we
> really don't know what is going on behind the scenes) and 6 (it is a
> single site.)


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 11:58:28 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 08:58:28 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>

Peter St. John wrote:
> But at any given time, a group can agree on (say) the lowest significant
> digits of the temperatures at time T in cities X, Y, and Z as reported at
> time T2 by the NWS.

Actually we don't know that, at least not reliably enough for this
purpose.  It may be that the one web address is actually multiple
servers, and if the NWS pushes out data revisions these could return
different results for T:X,Y,Z at T2 if the servers were not strictly
synchronized.  Never mind the caching problems that revisions like this
would create on browsers.  I have no idea if the NWS revises their data
files, but it would not be surprising if they did.

After posting I thought of one other source of more or less random
verifiable numbers - the scores of sporting events.  These are not
always generated every day, and are seasonal for the various sports. 
They are however highly verifiable and when multiple events are grouped,
pretty much impossible to "fix" to preselected digits.  For instance:

  http://www.nfl.com/scores
  http://mlb.mlb.com/mlb/scoreboard
  http://scores.espn.go.com/nba/scoreboard?date=20110304

These sites maintain historical records.  Even if they didn't the scores
are widely published, and there are tens of thousands of witnesses to
the original event, so it would be pretty much impossible to
intentionally change a final score.  There could still be copying/typo
errors from site to site though, but if such an error was discovered it
would be easy enough to resolve.  There is no intrinsic order to the
scores, and some scheduled games might be canceled, so it would have to
be something like "sort the scores from all NBA teams who played on
4/4/11 into ascending order and concatenate the digits".

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Fri Aug 12 12:09:46 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Fri, 12 Aug 2011 09:09:46 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D6B@ALTPHYEMBEVSP20.RES.AD.JPL>

All nice suggestions, but I wonder if they're truly random.

Scores of games have underlying patterns from the "rules of the game"  (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points.  basketball goals are 2 or 3 points, etc.)

I'm sure someone has analyzed this.

I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller).

What about using random.org and it being backed-up on archive.org?  Does that give you the "multiple independent sites" desired?


________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu]
Sent: Friday, August 12, 2011 08:58
To: Peter St. John; beowulf at beowulf.org
Subject: Re: [Beowulf] OT: public random numbers?

Peter St. John wrote:
> But at any given time, a group can agree on (say) the lowest significant
> digits of the temperatures at time T in cities X, Y, and Z as reported at
> time T2 by the NWS.

Actually we don't know that, at least not reliably enough for this
purpose.  It may be that the one web address is actually multiple
servers, and if the NWS pushes out data revisions these could return
different results for T:X,Y,Z at T2 if the servers were not strictly
synchronized.  Never mind the caching problems that revisions like this
would create on browsers.  I have no idea if the NWS revises their data
files, but it would not be surprising if they did.

After posting I thought of one other source of more or less random
verifiable numbers - the scores of sporting events.  These are not
always generated every day, and are seasonal for the various sports.
They are however highly verifiable and when multiple events are grouped,
pretty much impossible to "fix" to preselected digits.  For instance:

  http://www.nfl.com/scores
  http://mlb.mlb.com/mlb/scoreboard
  http://scores.espn.go.com/nba/scoreboard?date=20110304

These sites maintain historical records.  Even if they didn't the scores
are widely published, and there are tens of thousands of witnesses to
the original event, so it would be pretty much impossible to
intentionally change a final score.  There could still be copying/typo
errors from site to site though, but if such an error was discovered it
would be easy enough to resolve.  There is no intrinsic order to the
scores, and some scheduled games might be canceled, so it would have to
be something like "sort the scores from all NBA teams who played on
4/4/11 into ascending order and concatenate the digits".

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 13:04:43 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 10:04:43 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1QrvA3-0004lV-FR@mendel.bio.caltech.edu>

> All nice suggestions, but I wonder if they're truly random.

Random enough in this case - as they are only used to form a seed for a
random number generator, and a seed is only needed "rarely".  So even
though pro basketball scores have definite trends and often look like
(101,95),(103,87),(98,76), these can still create a decent seed value
once sorted and concatenated:  10310198958776
(Lets assume the seed need not be odd.)

> What about using random.org and it being backed-up on archive.org? 
Does that give you the "multiple independent sites" desired?

To some degree, but not as much as the large number of sites that
distribute game scores and stock values.  I originally favored using
stock values until it turned out that those numbers are squishier than
one might have expected, particularly so for indices like the S&P 500
and Dow Jones.  A fellow who works at S&P told me that the opening
prices are prone to timing problems, since at T=0+delta some of the
issues in the index will have traded, and some will not, with the
untraded stock values being filled in with stale values.  I think
similar timing issues affect all the other index values too
(high/low/close).  In these cases, since the index is derived from
formulas, some sites may be independently calculating the values, and
tiny differences in the times the stock values are measured result in
different numbers.  All it takes is one trade difference between the
sample points to change some digits.  When I get some time I still need
to look and see if the high/low/close values for individual stocks are
also variable from web site to web site.  These numbers might be more
reliable for single stocks since they might all trace back to the data
feed from the exchange where the issue trades.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 13:22:46 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 10:22:46 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1QrvRW-0004lo-I1@mendel.bio.caltech.edu>

Michael Di Domenico wrote:

> How many random numbers per day are you expecting?

One would be sufficient.

> If everyone checks at exactly 1pm, should they all see the same
> "random" number or should they each get their own "random" number?

They should all see the same number.

Example: a random number based on physical events which occurred on
8/10/11 would become available on or shortly after that day.  Starting
from the time it first becomes available, and going forward ideally
forever, everybody who wants to should be able to retrieve that same
random number.  

That is, nobody should be able to predict the number before hand, and
everybody should be able to verify it later.  So the number must be both
random and etched in stone.

> What kind of entropy are you expecting on "random"?

In practice relatively little is needed, 16 bits should be plenty.
(More wouldn't hurt, of course.)

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 14:35:17 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 14:35:17 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
Message-ID: <alpine.LFD.2.02.1108121416030.3145@lilith>

On Fri, 12 Aug 2011, David Mathog wrote:

> Robert G. Brown wrote:
>
>>  Everybody must be able to obtain it
>>> freely from a web connection.
>>
>>    http://www.random.org/
>>
>
> Nice site.  They have something that is very close, the pregenerated
> random files, from which a small set of digits may be extracted, and the
> files themselves have MD5 checksums (but are not signed).
> They also support https.  It comes up a little short on criteria 1 (we
> really don't know what is going on behind the scenes) and 6 (it is a
> single site.)

Behind the scenes is documented pretty well on the site, and the guy who
runs it is a human being, you can communicate with him to learn even
more.  I already know him a bit, as he and I have collaborated on
applying dieharder to test random.org datasets -- even "the" random.org
dataset as of some time ago (I have a few hundred MB of random number
from the site in my dieharder directory).  IIRC, the numbers are
generated continuously and fairly slowly by grabbing and filtering and
transforming atmospheric noise.  As a source of entropy, that is
probably excellent if (as noted) slow, but many good sources of entropy
seem to be fairly slow.  He has good reason to think that his numbers
are theoretically "true random numbers" -- both unpredictable and
flat/decorrelated at all orders, and even though there aren't really
enough of them for my purposes, I've used them as one of the (small)
"gold standard" sources for testing dieharder even as I test them.  For
all practical purposes threefish or aes are truly random as well and
they are a lot faster and easier to use as gold standard generators,
though.

I don't quite understand why the single site restriction is important --
this site has been up for years and I don't expect it to go away soon;
it is quite reliable.  I don't think there is anything secret about how
the numbers are generated, and I'll certify that the numbers it produces
don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
your part; 6 I don't really understand but the guy who runs the site is
clearly willing to construct a custom feed for cash customers, if there
is enough value in whatever it is you are trying to do to pay for
access.  If it's just a lottery, well, lord, I can think of a dozen ways
to make numbers so random that they'd be unimpeachable for any sort of
lottery, both unpredictable and uncorrelated, and they don't any of them
require any significant amount of entropy to get started.

I will add one warning -- "randomness" is a rather stringent
mathematical criterion, and is generally tested against the null
hypothesis.  Amateurs who want to make random number generators out of
supposedly "random" data streams or fancy algorithms almost invariably
fail, sometimes spectacularly so.  There are a half dozen or more
really, really good pseudorandom number generators out there and it is
easy to hotwire them together into an xor-based high entropy stream that
basically never repeats (feeding it a bit of real entropy now and then
as it operates).  I would strongly counsel you against trying to take
e.g. weather data and make something "random" out of it.  Unless you
really know what you are doing, you will probably make something that
isn't at all random and may not even be unpredictable.  Even most
sources of "quantum" randomness (which is at least possibly "truly
random", although I doubt it) aren't flat, so that they carry the
signature of their generation process unless/until you manage to
transform them into something flat (difficult unless you KNOW the
distribution they are producing).  Pseudorandom number generators have
the serious advantage of being amenable to at least some theoretical
analysis (so you can "guarantee" flatness out to some high
dimensionality, say) as well as empirical testing with e.g. dieharder.

HTH,

     rgb

>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 14:40:36 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 14:40:36 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <4E45460D.9040505@scalableinformatics.com>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<4E45460D.9040505@scalableinformatics.com>
Message-ID: <alpine.LFD.2.02.1108121438180.3145@lilith>

On Fri, 12 Aug 2011, Joe Landman wrote:

> On 08/12/2011 11:21 AM, David Mathog wrote:
>> Robert G. Brown wrote:
>>
>>>   Everybody must be able to obtain it
>>>> freely from a web connection.
>>>
>>>     http://www.random.org/
>
> And from SGI days ... http://www.lavarnd.org/

Yeah, like that.  Notice the work they have to do to make a
not-really-random or only partially-random source flat, unpredictable,
random.  What they do is probably overkill -- nobody on earth could
detect a deviation from randomness if they did only half of their
folding and retransformation with crypto grade prngs, but it is still a
pretty reliable scheme.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 14:59:59 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 14:59:59 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D6B@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D6B@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1108121441140.3145@lilith>

On Fri, 12 Aug 2011, Lux, Jim (337C) wrote:

> All nice suggestions, but I wonder if they're truly random.
>
> Scores of games have underlying patterns from the "rules of the game"  (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points.  basketball goals are 2 or 3 points, etc.)
>
> I'm sure someone has analyzed this.
>
> I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller).
>
> What about using random.org and it being backed-up on archive.org?  Does that give you the "multiple independent sites" desired?

As I said and repeat, nothing like this is at all random.  Random is
stuff like thermal noise, shot noise, quantum noise, and even all of
those things are distributed and not flat and require massaging to make
into uniform deviates or random bits.  Unpredictable is easy, of course
-- flip a coin, roll some dice -- until you need to make it
>>rigorously<< unpredictable and >>rigorously<< uncorrelated, at which
point you need to not screw around with weather, scores, market closing
values, even "randomly sampled" ticks of a nanosecond clock aren't that
random without some work to make them so.

I liked the lavarnd site, and I like random.org.  Hell, tap into both of
their streams, they're both practically perfect as sources of random
numbers go, and it gives you your redundancy and you can xor their
streams together to get yet another irrelevant and probably unnecessary
degree of lack of correlation.  Even if one stream is subtley correlated
and the other is too, the chances of the correlations "matching" and
persisting through an xor process are astronomical.  But then, finding
correlations in the output of a properly seeded crypto prng is pretty
astronomically unlikely BEFORE you xor-fold it stream-wise a few dozen
times into a source of real entropy like atmospheric noise or
electro-optical noise.

If you want something better, you'll probably have to explain your
application in a bit more detail.  Do you need rigorously random and
flat numbers, or just something unpredictable?  The latter is cheap and
easy and can be done in the privacy of your own home by reading from
/dev/random or /dev/urandom (or perhaps from Intel's new on-CPU rngs).
The former requires theory and some work and some heavy duty empirical
testing.

Just remember, numbers are not random.  Numbers are numbers.  The
number 7 could be "random" or not not by its nature but by how the 7
was generated.

Processes, in other words, are (approximately, oxymoronically) random.
If you want random numbers, find a (mathematically provably) "random"
process, at least to some order and for some purposes...

    rgb

>
>
> ________________________________________
> From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu]
> Sent: Friday, August 12, 2011 08:58
> To: Peter St. John; beowulf at beowulf.org
> Subject: Re: [Beowulf] OT: public random numbers?
>
> Peter St. John wrote:
>> But at any given time, a group can agree on (say) the lowest significant
>> digits of the temperatures at time T in cities X, Y, and Z as reported at
>> time T2 by the NWS.
>
> Actually we don't know that, at least not reliably enough for this
> purpose.  It may be that the one web address is actually multiple
> servers, and if the NWS pushes out data revisions these could return
> different results for T:X,Y,Z at T2 if the servers were not strictly
> synchronized.  Never mind the caching problems that revisions like this
> would create on browsers.  I have no idea if the NWS revises their data
> files, but it would not be surprising if they did.
>
> After posting I thought of one other source of more or less random
> verifiable numbers - the scores of sporting events.  These are not
> always generated every day, and are seasonal for the various sports.
> They are however highly verifiable and when multiple events are grouped,
> pretty much impossible to "fix" to preselected digits.  For instance:
>
>  http://www.nfl.com/scores
>  http://mlb.mlb.com/mlb/scoreboard
>  http://scores.espn.go.com/nba/scoreboard?date=20110304
>
> These sites maintain historical records.  Even if they didn't the scores
> are widely published, and there are tens of thousands of witnesses to
> the original event, so it would be pretty much impossible to
> intentionally change a final score.  There could still be copying/typo
> errors from site to site though, but if such an error was discovered it
> would be easy enough to resolve.  There is no intrinsic order to the
> scores, and some scheduled games might be canceled, so it would have to
> be something like "sort the scores from all NBA teams who played on
> 4/4/11 into ascending order and concatenate the digits".
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From nixon at nsc.liu.se  Fri Aug 12 16:46:21 2011
From: nixon at nsc.liu.se (Leif Nixon)
Date: Fri, 12 Aug 2011 22:46:21 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
Message-ID: <CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>

On 12 August 2011 17:58, David Mathog <mathog at caltech.edu> wrote:

> After posting I thought of one other source of more or less random
> verifiable numbers - the scores of sporting events. ?These are not
> always generated every day, and are seasonal for the various sports.
> They are however highly verifiable and when multiple events are grouped,
> pretty much impossible to "fix" to preselected digits.

Have you looked at RFC3797? Not sure if it has any solutions for you, but it
at least discusses the same problems.


-- 
Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert
------------------------------------------------------------
National Supercomputer Centre? ? -? ? ? Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Sat Aug 13 13:51:46 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 13 Aug 2011 13:51:46 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1108131338240.18847@lilith>

On Fri, 12 Aug 2011, Leif Nixon wrote:

> On 12 August 2011 17:58, David Mathog <mathog at caltech.edu> wrote:
>
>> After posting I thought of one other source of more or less random
>> verifiable numbers - the scores of sporting events. ?These are not
>> always generated every day, and are seasonal for the various sports.
>> They are however highly verifiable and when multiple events are grouped,
>> pretty much impossible to "fix" to preselected digits.
>
> Have you looked at RFC3797? Not sure if it has any solutions for you, but it
> at least discusses the same problems.

If people know how you are going to pick the seed of your rng, and know
the rng, and know (or measure) the distribution function from which your
seed is being drawn, they can easily transform the game into a non-zero
sum game with advantage over all of those that don't do all of that.

The only way to avoid this sort of thing is to pick your seed from a
flat, unpredictable distribution.  Unpredictable (in it's purest sense)
includes flat, but the score distribution of almost any sporting event
is, I'm pretty sure, not flat.

That's why I really don't like the idea of running a lottery off of data
like this.  No state lottery could ever be certified on top of this sort
of data.

I'll tell you what.  Piggy back your lottery to theirs.  Powerball games
occur every day all over the US.  Pick your seed from the last 10 digits
of one of those games.  They are announced, publicly available on
websites (I'm pretty sure), and if they aren't certifiably random,
something is seriously wrong.  In any event they are usually generated
from an easily understandable random physical process that is almost
certainly flat as well as unpredictable.

Then pop it into your favorite AES-based or threefish based RNG, or cook
up something yourself with even more rotors, spin it a while, and out
comes your lottery winner -- basically a transmogrification of public
state lottery number, but that's an ADVANTAGE, not a disadvantage...

    rgb

>
>
> -- 
> Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre? ? -? ? ? Linkoping University
> ------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From hahn at mcmaster.ca  Sat Aug 13 20:06:04 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Sat, 13 Aug 2011 20:06:04 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108131338240.18847@lilith>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
Message-ID: <Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>

>>> After posting I thought of one other source of more or less random
>>> verifiable numbers - the scores of sporting events. ?These are not

I immediately thought of another widely published stream of immutable noise: 
the congressional record.  sorry, no smiley ;)

> Then pop it into your favorite AES-based or threefish based RNG, or cook
> up something yourself with even more rotors, spin it a while, and out
> comes your lottery winner

sorry, I don't understand your emphasis on flatness.  why does the
distribution of the seed (entropy source) matter, as long as it's 
reasonably large and not predictable before publication date?
the crypto hash takes care of whitening, doesn't it?

thanks, mark hahn.
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From hahn at mcmaster.ca  Sat Aug 13 22:22:52 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Sat, 13 Aug 2011 22:22:52 -0400 (EDT)
Subject: [Beowulf] Memory Testing?
In-Reply-To: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1108132209420.6167@coffee.psychology.mcmaster.ca>

> I'm curious if anyone has any experience with ECC uncorrectable errors
> (specifically not the identification of), but which specific dimm in
> the chassis it's pointing to.

we've had good luck using EDAC to pin down bad dimms -
at least those that that cause _correctable_ errors.
our uncorrectable errors trigger panics.  I suppose that's selectable,
though I guess you could turn that off (/sys/module/edac_mc/panic_on_ue)

> The mcelog in linux doesn't seem to report the dimm slot correctly on
> my supermicro boards.

I prefer the hardware-topology-based naming that edac uses
(controller, channel, chipselect).  I guess recent versions of edac
have a user-space tool that will translate that for you (but of course,
you have to verify the topo-to-label mapping yourself anyway.)

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Sun Aug 14 18:05:31 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 14 Aug 2011 18:05:31 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
	<Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>
Message-ID: <alpine.LFD.2.02.1108141420140.18847@lilith>

On Sat, 13 Aug 2011, Mark Hahn wrote:

>>>> After posting I thought of one other source of more or less random
>>>> verifiable numbers - the scores of sporting events. ?These are not
>
> I immediately thought of another widely published stream of immutable noise: 
> the congressional record.  sorry, no smiley ;)
>
>> Then pop it into your favorite AES-based or threefish based RNG, or cook
>> up something yourself with even more rotors, spin it a while, and out
>> comes your lottery winner
>
> sorry, I don't understand your emphasis on flatness.  why does the
> distribution of the seed (entropy source) matter, as long as it's reasonably 
> large and not predictable before publication date?
> the crypto hash takes care of whitening, doesn't it?

Bayes theorem.  If one knows that (say) the distribution of digits in
sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all
the other digits -- because e.g. football games rarely get 4-9 in the
second digit slot (note that this is an example only) one can gain a
near 2-1 advantage over everybody else playing by picking seeds with the
right frequencies and using only those seeds to select a set of numbers,
if (as it sounds) there is an openly published unique map between the
seed and the lottery outcome so "anybody can check that it is fair".  In
this latter case you aren't trying to guess the white "random" outcome,
you are trying to guess the seed, and if the seed is drawn from a
non-flat space you'll beat the pants off of anyone playing blind by
using that space to generate your seeds/guesses.

Basically you take the lottery from being a lottery with all numbers
equally represented in the outcome space to being the moral equivalent
of predicting the actual point outcome of N football or basketball
games.  The size of the latter space is MUCH smaller than the size of
all possible scores, right?  In fact, it is "small" compared to the
latter space.

So, sorry, I think that for a lottery (especially one with e.g. a cash
payout and deep pocketed people capable of speculatively gambling to win
based on expectation value based on an openly published hash and seeing
method) needs to use a true random, true white seed, since you might
just as well use the seed as the lottery number in this case and in no
other case is it fair.

Of course, if the lottery is for cakes at a bake sale, who cares.  Just
don't underestimate the cleverness of would-be attackers if the lottery
has an openly published method of generating the result and/or
potentially large payout.  Plenty of people would tackle the project of
cracking the lottery just for the thrill, even if the payout wasn't that
great.  If the payout was large enough, you'd have have deep-pocketed
smart people covering the entire most-likely-point spread generated by
Vegas bookies, week after week, through proxies, and making a bundle
from it.

    rgb

>
> thanks, mark hahn.

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Sun Aug 14 22:59:25 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Sun, 14 Aug 2011 19:59:25 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108141420140.18847@lilith>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
	<Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>,
	<alpine.LFD.2.02.1108141420140.18847@lilith>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F010850F22F15@ALTPHYEMBEVSP20.RES.AD.JPL>

Given the discussion about lotteries, etc.

This is the classic thing of "numbers games" as run by the mob.  You pick a 3 digit number, and the winning number is determined by some readily available public source (stock market, sports games, racetrack winners, etc.).  There's probably a fair amount of literature (aside from the works of M. Puzo) describing it.

Payoff was something like 600:1 or 750:1, against a nominal 1000:1, so the numbers bank makes their money on the differential (the vig).

Just looked up wikipedia..
" later led to the use of the last three numbers in the published daily balance of the United States Treasury."

A moderately well known mathematician named Claude Shannon probably analyzed it.. He collaborated with E. Thorpe on some other interesting work on games.
________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Robert G. Brown [rgb at phy.duke.edu]
Sent: Sunday, August 14, 2011 15:05
To: Mark Hahn
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] OT: public random numbers?

On Sat, 13 Aug 2011, Mark Hahn wrote:

>>>> After posting I thought of one other source of more or less random
>>>> verifiable numbers - the scores of sporting events. ?These are not
>
> I immediately thought of another widely published stream of immutable noise:
> the congressional record.  sorry, no smiley ;)
>
>> Then pop it into your favorite AES-based or threefish based RNG, or cook
>> up something yourself with even more rotors, spin it a while, and out
>> comes your lottery winner
>
> sorry, I don't understand your emphasis on flatness.  why does the
> distribution of the seed (entropy source) matter, as long as it's reasonably
> large and not predictable before publication date?
> the crypto hash takes care of whitening, doesn't it?

Bayes theorem.  If one knows that (say) the distribution of digits in
sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all
the other digits -- because e.g. football games rarely get 4-9 in the
second digit slot (note that this is an example only) one can gain a
near 2-1 advantage over everybody else playing by picking seeds with the
right frequencies and using only those seeds to select a set of numbers,
if (as it sounds) there is an openly published unique map between the
seed and the lottery outcome so "anybody can check that it is fair".  In
this latter case you aren't trying to guess the white "random" outcome,
you are trying to guess the seed, and if the seed is drawn from a
non-flat space you'll beat the pants off of anyone playing blind by
using that space to generate your seeds/guesses.

Basically you take the lottery from being a lottery with all numbers
equally represented in the outcome space to being the moral equivalent
of predicting the actual point outcome of N football or basketball
games.  The size of the latter space is MUCH smaller than the size of
all possible scores, right?  In fact, it is "small" compared to the
latter space.

So, sorry, I think that for a lottery (especially one with e.g. a cash
payout and deep pocketed people capable of speculatively gambling to win
based on expectation value based on an openly published hash and seeing
method) needs to use a true random, true white seed, since you might
just as well use the seed as the lottery number in this case and in no
other case is it fair.

Of course, if the lottery is for cakes at a bake sale, who cares.  Just
don't underestimate the cleverness of would-be attackers if the lottery
has an openly published method of generating the result and/or
potentially large payout.  Plenty of people would tackle the project of
cracking the lottery just for the thrill, even if the payout wasn't that
great.  If the payout was large enough, you'd have have deep-pocketed
smart people covering the entire most-likely-point spread generated by
Vegas bookies, week after week, through proxies, and making a bundle
from it.

    rgb

>
> thanks, mark hahn.

Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Mon Aug 15 07:57:26 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 15 Aug 2011 07:57:26 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F010850F22F15@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
	<Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>,
	<alpine.LFD.2.02.1108141420140.18847@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010850F22F15@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1108150753570.14474@lilith>

On Sun, 14 Aug 2011, Lux, Jim (337C) wrote:

> A moderately well known mathematician named Claude Shannon probably
> analyzed it.. He collaborated with E. Thorpe on some other interesting
> work on games.

Shannon?  Shannon?  The name almost rings a Bell.  For your information,
I think he's a few bits short of a byte, if you know what I mean.  The
guy practically Bayes at the moon.

Sorry... feeling a bit, well, random this morning.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Mon Aug 15 13:08:59 2011
From: mathog at caltech.edu (David Mathog)
Date: Mon, 15 Aug 2011 10:08:59 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1Qt0ep-0005vm-5A@mendel.bio.caltech.edu>

Leif Nixon <nixon at nsc.liu.se> wrote:

> Have you looked at RFC3797? Not sure if it has any solutions for you,
but it
> at least discusses the same problems.

Good reference, I was not aware of that.  

It gives the same sorts of sources for random numbers as we have come up
with here: stock market, sports, lottery.  It discusses how stock market
data may not be reliable due to market splits and other accounting
issues.  However, I have determined that the raw data from the exchanges
is a terrible choice because it is not available for free, and the
values that are freely available, which are posted on web finance sites,
are not reliably identical in all digits.

Lottery results are a good source except for the black box / black
helicopter factors.  We don't generally know where those numbers are
coming from, and even in those cases where they do tell us, there is no
way to verify that any particular lottery drawing wasn't rigged.

We have not discussed election results (votes per candidate), but those
are, ironically, really unsuitable for this, even though statistically
the final set of digits should have a lot of entropy.  Mostly election
numbers are a problem because they may be revised for long periods after
the election, and the numbers could almost always be forced to shift by
a challenge by one of the candidates.  Every recount will come up with a
slightly different result.  Examples: the Coleman vs. Franken senatorial
contest in Minnesota, or Bush vs. Gore in Florida.

So I'm leaning towards sports scores, as those are generated in full
view of a multitude of witnesses (often numbering in the millions).  It
would be extremely difficult to rig the absolute final score.  It might
be possible to rig the winner, or even the point spread, but to rig the
absolute score in a high scoring game like basketball, would be
exceedingly difficult, and would likely be obvious to even the casual
observer.  To rig every digit in the final score of every game played on
a given day should be pretty close to impossible.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cousins at umit.maine.edu  Mon Aug 15 16:59:11 2011
From: cousins at umit.maine.edu (Steve Cousins)
Date: Mon, 15 Aug 2011 16:59:11 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <mailman.1.1313434802.4411.beowulf@beowulf.org>
References: <mailman.1.1313434802.4411.beowulf@beowulf.org>
Message-ID: <alpine.LFD.2.00.1108151653260.8210@razzo.umeoce.maine.edu>


Hi David,

Can you give us more information about what you are doing? I'm getting 
curious about what problem you are working with that requires these 
conditions.

Steve

> We have not discussed election results (votes per candidate), but those
> are, ironically, really unsuitable for this, even though statistically
> the final set of digits should have a lot of entropy.  Mostly election
> numbers are a problem because they may be revised for long periods after
> the election, and the numbers could almost always be forced to shift by
> a challenge by one of the candidates.  Every recount will come up with a
> slightly different result.  Examples: the Coleman vs. Franken senatorial
> contest in Minnesota, or Bush vs. Gore in Florida.
>
> So I'm leaning towards sports scores, as those are generated in full
> view of a multitude of witnesses (often numbering in the millions).  It
> would be extremely difficult to rig the absolute final score.  It might
> be possible to rig the winner, or even the point spread, but to rig the
> absolute score in a high scoring game like basketball, would be
> exceedingly difficult, and would likely be obvious to even the casual
> observer.  To rig every digit in the final score of every game played on
> a given day should be pretty close to impossible.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Wed Aug 17 16:59:58 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 17 Aug 2011 13:59:58 -0700
Subject: [Beowulf] Fwd: H8DMR-82 ECC error
In-Reply-To: <B7F9806C-78E8-46D6-A1C6-184FF8D32827@staff.uni-marburg.de>
References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk>
	<B7F9806C-78E8-46D6-A1C6-184FF8D32827@staff.uni-marburg.de>
Message-ID: <20110817205958.GB7650@bx9.net>

> Memtest was ok, I done 9 cycles without any problems.

You should be using the HPL implementation of the Linpack benchmark
for testing memory. It exercises all of the memory and all of the
cores, and is what most HPC vendors seem to use for node burnin.
There's even a bootable DVD with a kernel with enhanced EDAC that was
mentioned here a while back.

> Hardware Error
> CPU0 Machine Check Exception  4 Bank 2 b200200000000863
> TSC 108dd369444
> Processor 2:40f13 Time 1311847912 Socket 0 APIC 0
> MC2-Status: Uncorredted error, report: yes MisV: invalid
> CPU context corrupt: yes UECC Error
> Bud Unit Error: prefetch/ECC error in data read from NB: local node originated 
> (SRC)
> Transaction type: prefetch (mem access), no timeout, cache level L3/generic. 
> Participating Processors: local node originated (SRC)

And I take it that the location information given here (socket 0, bank
2) isn't useful?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From david.t.kewley at gmail.com  Sat Aug 20 16:16:28 2011
From: david.t.kewley at gmail.com (David Kewley)
Date: Sat, 20 Aug 2011 13:16:28 -0700
Subject: [Beowulf] Memory Testing?
In-Reply-To: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
Message-ID: <CAM+eeKf6os2jphciAE_Hnd4xtEiSZcZ5RKZrfMc08HEwux-xMQ@mail.gmail.com>

A few bits from my corner of the experience space:

If you have a BMC, 'ipmitool sel list' will probably show the correctable
and uncorrectable errors, generally not naming the DIMM involved. But
'ipmitool sel list -v' shows details from various fields in the SEL records.
 In the ASUS boards I've been playing with lately, the Sensor Number field
together with the Event Data field will (usually) tell you the DIMM slot,
once you know how to decode those fields for the specific motherboard (and
possibly firmware revisions?) that you have.

How do you get that motherboard-specific data?  By finding a DIMM that
reliably produces errors, and moving it from slot to slot, taking notes on
those two SEL fields above.  I've seen a similar thing work for Dell
machines too.

If you have Dell PowerEdge R or M boxes (or previous generation
equivalents), there are various nicer ways to get the name of the DIMM
involved, including using a version of ipmitool that has the 'delloem'
subcommand.

I second Tony's suggestion that RAM testers may not be as good as real
systems, for finding bad RAM.  My experience on one large system a few years
ago was that new DIMMs failed at a rate of around 1% per year, but
"refurbished" DIMMs from RMAs failed at 10% per year (or was it even higher?
I forget).  I was led to believe that these refurbished DIMMs were often
customer returns that had been run through a RAM tester and passed.  Turns
out sometimes the customers were right and the "refurbishment" process was
wrong.

One more thing about the ASUS boards I've been playing with lately: If you
get a panic on uncorrectable memory error, and power cycle the system (using
the power button, or by remote 'ipmitool ... power cycle'), the following
POST does not report the bad DIMM.  But if you *reset* the system (by
pushing the reset button with a paperclip, or by remote 'ipmitool ... power
reset'), the next POST will pause and tell you what CPU, Channel, and DIMM
was affected on that previous uncorrectable error, which is more info that
'ipmitool sel list' gives you.  It's then up to you to figure out how CPU,
Channel, and DIMM map to the silkscreened names on the motherboard -- I
couldn't find documentation, but it turned out to be the pattern we
suspected. :)

David

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110820/52530b43/attachment.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From john.hearns at mclaren.com  Tue Aug 23 11:46:59 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Tue, 23 Aug 2011 16:46:59 +0100
Subject: [Beowulf] Flash storage arrays
Message-ID: <207BB2F60743C34496BE41039233A809071F88D8@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

Does anyone have an opinion of these for CFD workloads:


http://www.theregister.co.uk/2011/08/23/pure_storage_fa_300/

the interesting thing is they claim is is cheaper than disk - but that's
a hard claim to assess
in an HPC context as it SEEMS to be only when their inbuild
deduplication is taken into account.
I'm not sure how much dedupe buys you with typical HPC data - ie large
files rather than lots
of nearly-identical emails or visrtual disk images.

John Hearns | CFD Hardware Specialist | McLaren Racing Limited
McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

T:  +44 (0) 1483 261000
D:  +44 (0) 1483 262352
F:  +44 (0) 1483 261010
E:  john.hearns at mclaren.com
W:  www.mclaren.com


The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Wed Aug 24 21:30:39 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 25 Aug 2011 03:30:39 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
Message-ID: <8BBC31C3-7F1A-433C-863A-5F0EBB4714AC@xs4all.nl>

In a world where you don't trust others, using MD5 is out of the  
question. It's not safe. It's possible to fake a MD5 sum
by modifying the number to whatever you wish (if it is enough random  
data) and then add something, with just a small correction
to the data to again get the md5sum that was posted on the website.

Vincent

On Aug 12, 2011, at 5:21 PM, David Mathog wrote:

> Robert G. Brown wrote:
>
>>  Everybody must be able to obtain it
>>> freely from a web connection.
>>
>>    http://www.random.org/
>>
>
> Nice site.  They have something that is very close, the pregenerated
> random files, from which a small set of digits may be extracted,  
> and the
> files themselves have MD5 checksums (but are not signed).
> They also support https.  It comes up a little short on criteria 1 (we
> really don't know what is going on behind the scenes) and 6 (it is a
> single site.)
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Wed Aug 24 21:58:52 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 25 Aug 2011 03:58:52 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108121416030.3145@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
Message-ID: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>


On Aug 12, 2011, at 8:35 PM, Robert G. Brown wrote:

> On Fri, 12 Aug 2011, David Mathog wrote:
>
>> Robert G. Brown wrote:
>>
>>>  Everybody must be able to obtain it
>>>> freely from a web connection.
>>>
>>>    http://www.random.org/
>>>
>>
>> Nice site.  They have something that is very close, the pregenerated
>> random files, from which a small set of digits may be extracted,  
>> and the
>> files themselves have MD5 checksums (but are not signed).
>> They also support https.  It comes up a little short on criteria 1  
>> (we
>> really don't know what is going on behind the scenes) and 6 (it is a
>> single site.)
>
> Behind the scenes is documented pretty well on the site, and the  
> guy who
> runs it is a human being, you can communicate with him to learn even
> more.  I already know him a bit, as he and I have collaborated on
> applying dieharder to test random.org datasets -- even "the"  
> random.org
> dataset as of some time ago (I have a few hundred MB of random number
> from the site in my dieharder directory).  IIRC, the numbers are
> generated continuously and fairly slowly by grabbing and filtering and
> transforming atmospheric noise.  As a source of entropy, that is
> probably excellent if (as noted) slow, but many good sources of  
> entropy
> seem to be fairly slow.  He has good reason to think that his numbers
> are theoretically "true random numbers"

Well there is another test i stumbled upon when i did do some  
analysis on
casino (which student who takes himself serious doesn't do an attempt to
write some simulations seeing whether you can win something in the  
casino
by designing some strategies?).

The simulation revealed it was rather easy to make a fortune with  
roulette
with the doubling system (first put in 1 then if you win, put in 1  
else double
and keep doublinguntil you win). Reports from guys (some of them missing
an eye, another one a hand) who actually study anything trying to make a
profit in casino's (and they also really try it in the casino's),  
revealed that
using the doubling system they never saw someone really make big profit
with it.

So there was a problem between the random generated data versus the
true random numbers generated in the casino.

Statistical analysis revealed the problem, though not so soon.

I noticed that most generated semi-random numbers with software  
generators,
had the habit to truely adress a search space of n always in O (n log  
n).

So if you draw from most software RNG's a number and do it modulo n,
with n being not too tiny, say quite some millions or even billions,  
then every
slot in your 'hashtable' will get hit at least once by the RNG,  
whereas data
in reality simply happens to not have that habit simply.

So true random numbers versus generated noise is in this manner easy
to distinguish by this. Now i didn't study literature whether some  
other chap
some long time ago already had invented this. That would be most  
interesting
to know.

In semi pseudo code, let's take an array of size a billion as an  
example,
though usually a few million is more than ok:

n = 2^30; // 2 to the power 30

Function TestNumbersForRandomness(RNG,n) {
   declare array hashtable[size n];

   guessednlogn = 2 * (log n / log 2) * n;

   for( i = 0 ; i < n ; i++ )
     hashtable[i] = FALSE;

   ndraws = filledn = 0;
   while( ndraws  < guessednlogn ) {
      randomnumber = RNG();
      r = randomnumber % n; //     randomnumber =  r  (mod n)
      if( hashtable[r] == FALSE ) {
         hashtable[r] = TRUE;
         filledn++;
         if( filledn >= n )
           break;

     }
     ndraws++;
   }

   if( filledn >= n )
      print "With high degree of certainty data generated by a RNG\n");
    else
      print "Not so sure it's a RNG\n";

}


Regards,
Vincent


> -- both unpredictable and
> flat/decorrelated at all orders, and even though there aren't really
> enough of them for my purposes, I've used them as one of the (small)
> "gold standard" sources for testing dieharder even as I test them.   
> For
> all practical purposes threefish or aes are truly random as well and
> they are a lot faster and easier to use as gold standard generators,
> though.
>
> I don't quite understand why the single site restriction is  
> important --
> this site has been up for years and I don't expect it to go away soon;
> it is quite reliable.  I don't think there is anything secret about  
> how
> the numbers are generated, and I'll certify that the numbers it  
> produces
> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
> your part; 6 I don't really understand but the guy who runs the  
> site is
> clearly willing to construct a custom feed for cash customers, if  
> there
> is enough value in whatever it is you are trying to do to pay for
> access.  If it's just a lottery, well, lord, I can think of a dozen  
> ways
> to make numbers so random that they'd be unimpeachable for any sort of
> lottery, both unpredictable and uncorrelated, and they don't any of  
> them
> require any significant amount of entropy to get started.
>
> I will add one warning -- "randomness" is a rather stringent
> mathematical criterion, and is generally tested against the null
> hypothesis.  Amateurs who want to make random number generators out of
> supposedly "random" data streams or fancy algorithms almost invariably
> fail, sometimes spectacularly so.  There are a half dozen or more
> really, really good pseudorandom number generators out there and it is
> easy to hotwire them together into an xor-based high entropy stream  
> that
> basically never repeats (feeding it a bit of real entropy now and then
> as it operates).  I would strongly counsel you against trying to take
> e.g. weather data and make something "random" out of it.  Unless you
> really know what you are doing, you will probably make something that
> isn't at all random and may not even be unpredictable.  Even most
> sources of "quantum" randomness (which is at least possibly "truly
> random", although I doubt it) aren't flat, so that they carry the
> signature of their generation process unless/until you manage to
> transform them into something flat (difficult unless you KNOW the
> distribution they are producing).  Pseudorandom number generators have
> the serious advantage of being amenable to at least some theoretical
> analysis (so you can "guarantee" flatness out to some high
> dimensionality, say) as well as empirical testing with e.g. dieharder.
>
> HTH,
>
>      rgb
>
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Thu Aug 25 08:11:07 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 25 Aug 2011 08:11:07 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108250743280.3079@lilith>

On Thu, 25 Aug 2011, Vincent Diepeveen wrote:

> I noticed that most generated semi-random numbers with software generators,
> had the habit to truely adress a search space of n always in O (n log n).
>
> So if you draw from most software RNG's a number and do it modulo n,
> with n being not too tiny, say quite some millions or even billions, then 
> every
> slot in your 'hashtable' will get hit at least once by the RNG, whereas data
> in reality simply happens to not have that habit simply.
>
> So true random numbers versus generated noise is in this manner easy
> to distinguish by this. Now i didn't study literature whether some other chap
> some long time ago already had invented this. That would be most interesting
> to know.

Some other chap named George Marsaglia (and to some extent another chap
named Donald Knuth) have already invented this.  A number of tests of
the tails of random number generators are already in dieharder.  All
"good" modern rngs pass these tests.

The Martingale betting system you are looking at is even older (at least
Marsaglia and Knuth are still alive).  It dates back to the 18th
century, and is well known to be flawed for a variety of reasons, not
the least of which is that gamblers don't have the infinite wealth
necessary to make this >>even<< a zero-sum strategy and casinos have
betting limits that de facto make it impossible to pursue the requisite
number of steps and in roulette in particular have 0 and/or 00 slots and
aren't zero-sum to begin with.  You can read a decent analysis of
outcomes based on the presumed binomial distribution of a zero-sum game
here:

   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29

Your test below is interesting, though.  The only real problems I can
see with actually using it in dieharder are:

   a) One would need a theoretical estimate of the distribution of
filling given n log n draws on an n-slotted table (for largish n).  That
is, for a perfect rng, what SHOULD the distribution of success/failure
be.

   b) One would then need the CDF for this distribution, to be able to
turn the results of N trials (of n log n pulls each) into a p-value
under the null hypothesis -- the probability of obtaining the particular
number of successes and failures presuming a perfectly random generator.

That way dieharder could apply it rigorously to its 70 or 80 embedded
rngs or to any user's outboard generator.  There probably is theoretical
statistical support for the PD and/or CDF -- you're analyzing the tails
of a poissonian process -- but finding it or doing it yourself (or
myself), aye, that's the rub.  One cannot just say "high degree of
certainty that it is an RNG" (by which one means that the rng in
question fails the test for randomness) in the test.  HOW high?  Perfect
rngs or perfectly random processes will sometimes fill your table, but
how often?  How can you differentiate an "accident" when one does from
an actual failure?  All of those questions require a more rigorous
theory and quantitative result embedded in a test that can be
systematically cranked up to more clearly resolve failures until they
are unambiguous, not marginal maybe yes maybe no.

I suspect that the failures this test would reveal are already more than
covered in dieharder, in particular by the bit distribution tests and
the monkey tests, but I'm not terribly happy with the monkey tests and
would be perfectly thrilled to have a simpler to compute test that
revealed precisely this sort of flaw, systematically.  And it doesn't
hurt at all to have partially or fully redundant tests as long as the
test themselves are rigorously valid.  If you can find or compute the
CDF for your test below, I'd be happy to wrap it up and add it to
dieharder, in other words.  One can always SIMULATE a CDF, of course,
but that requires a known good generator and sort of begs the question
if you don't think that e.g. AES or threefish or KISS are good
generators that would actually pass your test.

Even hardware/quantum sources of random bits are suspect -- they often
are generated by a process that leaves in the traces of an underlying
distribution.  I'm not convinced that >>any<< process in the real world
is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
description of a closed system is just as deterministic as the classical
one, and Master equation unpredictability on open subsets of a large
closed system reflects entropy/ignorance, not actual randomness (hence
Einstein's famous "doesn't play dice" remark).  But lots of this are
sufficiently random that one cannot detect any failure of randomness,
modern crypto class generators being a prime example.

    rgb

>
> In semi pseudo code, let's take an array of size a billion as an example,
> though usually a few million is more than ok:
>
> n = 2^30; // 2 to the power 30
>
> Function TestNumbersForRandomness(RNG,n) {
> declare array hashtable[size n];
>
> guessednlogn = 2 * (log n / log 2) * n;
>
> for( i = 0 ; i < n ; i++ )
>   hashtable[i] = FALSE;
>
> ndraws = filledn = 0;
> while( ndraws  < guessednlogn ) {
>    randomnumber = RNG();
>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>    if( hashtable[r] == FALSE ) {
>       hashtable[r] = TRUE;
>       filledn++;
>       if( filledn >= n )
>         break;
>
>   }
>   ndraws++;
> }
>
> if( filledn >= n )
>    print "With high degree of certainty data generated by a RNG\n");
>  else
>    print "Not so sure it's a RNG\n";
>
> }
>
>
>
>
>
> Regards,
> Vincent
>
>
>
>
>> -- both unpredictable and
>> flat/decorrelated at all orders, and even though there aren't really
>> enough of them for my purposes, I've used them as one of the (small)
>> "gold standard" sources for testing dieharder even as I test them.  For
>> all practical purposes threefish or aes are truly random as well and
>> they are a lot faster and easier to use as gold standard generators,
>> though.
>> 
>> I don't quite understand why the single site restriction is important --
>> this site has been up for years and I don't expect it to go away soon;
>> it is quite reliable.  I don't think there is anything secret about how
>> the numbers are generated, and I'll certify that the numbers it produces
>> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
>> your part; 6 I don't really understand but the guy who runs the site is
>> clearly willing to construct a custom feed for cash customers, if there
>> is enough value in whatever it is you are trying to do to pay for
>> access.  If it's just a lottery, well, lord, I can think of a dozen ways
>> to make numbers so random that they'd be unimpeachable for any sort of
>> lottery, both unpredictable and uncorrelated, and they don't any of them
>> require any significant amount of entropy to get started.
>> 
>> I will add one warning -- "randomness" is a rather stringent
>> mathematical criterion, and is generally tested against the null
>> hypothesis.  Amateurs who want to make random number generators out of
>> supposedly "random" data streams or fancy algorithms almost invariably
>> fail, sometimes spectacularly so.  There are a half dozen or more
>> really, really good pseudorandom number generators out there and it is
>> easy to hotwire them together into an xor-based high entropy stream that
>> basically never repeats (feeding it a bit of real entropy now and then
>> as it operates).  I would strongly counsel you against trying to take
>> e.g. weather data and make something "random" out of it.  Unless you
>> really know what you are doing, you will probably make something that
>> isn't at all random and may not even be unpredictable.  Even most
>> sources of "quantum" randomness (which is at least possibly "truly
>> random", although I doubt it) aren't flat, so that they carry the
>> signature of their generation process unless/until you manage to
>> transform them into something flat (difficult unless you KNOW the
>> distribution they are producing).  Pseudorandom number generators have
>> the serious advantage of being amenable to at least some theoretical
>> analysis (so you can "guarantee" flatness out to some high
>> dimensionality, say) as well as empirical testing with e.g. dieharder.
>> 
>> HTH,
>>
>>     rgb
>> 
>>> 
>>> Thanks,
>>> 
>>> David Mathog
>>> mathog at caltech.edu
>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> 
>> 
>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>> Duke University Dept. of Physics, Box 90305
>> Durham, N.C. 27708-0305
>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>> 
>> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Aug 25 21:55:04 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 03:55:04 +0200
Subject: [Beowulf] OT: Calculating Extraterrestrial Life - was public
	random numbers?
In-Reply-To: <alpine.LFD.2.02.1108250743280.3079@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
Message-ID: <BF6839DC-68D7-4169-A13C-F706909DEC7B@xs4all.nl>


On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:

> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>
>> I noticed that most generated semi-random numbers with software  
>> generators,
>> had the habit to truely adress a search space of n always in O (n  
>> log n).
>>
>> So if you draw from most software RNG's a number and do it modulo n,
>> with n being not too tiny, say quite some millions or even  
>> billions, then every
>> slot in your 'hashtable' will get hit at least once by the RNG,  
>> whereas data
>> in reality simply happens to not have that habit simply.
>>
>> So true random numbers versus generated noise is in this manner easy
>> to distinguish by this. Now i didn't study literature whether some  
>> other chap
>> some long time ago already had invented this. That would be most  
>> interesting
>> to know.
>
> Some other chap named George Marsaglia (and to some extent another  
> chap
> named Donald Knuth) have already invented this.  A number of tests of
> the tails of random number generators are already in dieharder.  All
> "good" modern rngs pass these tests.
>
> The Martingale betting system you are looking at is even older (at  
> least
> Marsaglia and Knuth are still alive).  It dates back to the 18th
> century, and is well known to be flawed for a variety of reasons, not
> the least of which is that gamblers don't have the infinite wealth
> necessary to make this >>even<< a zero-sum strategy and casinos have
> betting limits that de facto make it impossible to pursue the  
> requisite
> number of steps and in roulette in particular have 0 and/or 00  
> slots and
> aren't zero-sum to begin with.  You can read a decent analysis of
> outcomes based on the presumed binomial distribution of a zero-sum  
> game
> here:
>
>   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>
> Your test below is interesting, though.  The only real problems I can
> see with actually using it in dieharder are:
>
>   a) One would need a theoretical estimate of the distribution of
> filling given n log n draws on an n-slotted table (for largish n).   
> That
> is, for a perfect rng, what SHOULD the distribution of success/failure
> be.
>
>   b) One would then need the CDF for this distribution, to be able to
> turn the results of N trials (of n log n pulls each) into a p-value
> under the null hypothesis -- the probability of obtaining the  
> particular
> number of successes and failures presuming a perfectly random  
> generator.
>
> That way dieharder could apply it rigorously to its 70 or 80 embedded
> rngs or to any user's outboard generator.  There probably is  
> theoretical
> statistical support for the PD and/or CDF -- you're analyzing the  
> tails
> of a poissonian process -- but finding it or doing it yourself (or
> myself), aye, that's the rub.  One cannot just say "high degree of
> certainty that it is an RNG" (by which one means that the rng in
> question fails the test for randomness) in the test.  HOW high?   
> Perfect
> rngs or perfectly random processes will sometimes fill your table, but
> how often?  How can you differentiate an "accident" when one does from
> an actual failure?  All of those questions require a more rigorous
> theory and quantitative result embedded in a test that can be
> systematically cranked up to more clearly resolve failures until they
> are unambiguous, not marginal maybe yes maybe no.
>
> I suspect that the failures this test would reveal are already more  
> than
> covered in dieharder, in particular by the bit distribution tests and

Thanks for your kind words - you'll realize that, seeing all the  
theories you quote
below where i simply have few knowledge from and definitely not the  
time for
to investigate (yet), you're talking way above my level of knowledge  
there.

Instead of going deep into mathematical theories i would find it more  
appropriate
to ponder on the feasability to calculate the existance of  
extraterrestrial life.

Now i realize a lot of efforts go into recognizing messages from  
outer space.

Yet we can also speculate on some things.

First of all i'd like to make a statement on extra terrestrial life  
and a viewpoint there.
One viewpoint i've seen promoted is that some scientist(s) claim we  
should
hide ourselves for extraterrestrial life.

I fully disagree there to some extend. If there is extraterrestrial  
life that is more advanced than our
society, obviously they also could have build weapons to totally  
selfdestruct and would already
have killed themselves if they would have been agressive forms of life.

If they were succesful in reproducing themselves, just like humankind  
is right now,
they would have burned up all resources at their own planet, caused  
massive
extinctions. It is difficult to defend the statement that all mass  
extinctions were caused by meteorites only -
for such statement one would need a proof for every single mass  
extinction being caused by a specific
meteorite; the extinction could just have been that a certain  
succesful species dominated the planet a tad too much
and didn't get clever enough to selfcontrol nor selfregulate to an  
extend that the planet didn't entirely
die. After some millions of years life restores itself then on the  
planet.

So if there would be such an intelligent life elsewhere more advanced  
than our society
is, they sure would want to communicate in a manner that information  
could get read
through different galaxies.

However for the most primitive life forms that are succesful to  
dominate a planet one
would want to hide this information for until such society reaches a  
specific level.

One would only want intelligent life to decypher
such extraterrestrial form of communication by another extra  
terrestrial form of life,
where the form of life is of a sustainable peaceful level.

I would argue such a lifeform would not form a threat to anyone, as  
they already
have proven to not be a threat to their own planet. So if the  
knowledge of this
society is high enough to be able to control all that, one would also  
be able to argue
that belonging to that high level of development,
would belong a specific level of math. A level strong enough to  
decypher the form
of communication that gets used to communicate between the different  
very intelligent
lifeforms in existance through the galaxies.

 From the fact that there is not a systematic form of contact with  
extraterrestrial life
we can already deduce that humankind still has to develop itself  
further from a species
that burns up all its resources, especially causing too much output  
of CO2 (the latest
report i'll have to check out
is that the increased CO2 level increases the amount of CO2 absorbed  
by the
oceans causing it to get more sour, causing plankton, start of the  
foodchain, to
not develop its skeleton enough, which for sure in the long run will  
cause mass
extinction).

Now we might not be advanced enough yet to decypher extraterrestrial  
communication,
so i wonder whether we might be able to recognize somehow that there  
is information getting
communicated using a form of encryption that we simply cannot  
decypher yet, based upon
comparing it versus how our RNG's work. Some of them run for example  
over a primefield,
others have a distribution too perfect.

If we get from space radiation measurements back, and we test them  
for belonging in a specific
class or type of randomness versus non randomness; how does that  
compare with if we have a source
of radiation ourselves that's comparable to that and its randomness  
classification?

Obviously the algorithm i gave is just one specific form of algorithm  
to measure a perfect distribution -
as you already indicated there are many other tests invented already.

In how far have those been applied to what could be encrypted  
communication from extraterrestrial life
to other extraterrestrial life (like us if we manage to survive as  
species and develop further to a
peaceful level that can sustain itself for a longer period of time).

So summarized what i wonder about is in how random number theory can  
contribute to detecting
extraterrestrial life (of course with a specific statistical  
significance to it).

This of course in combination with experiments conducted that allow  
us to first classify how a specific form of
possible communication system would behave normally spoken according  
to the randomness classification system,
versus the classification on how the measured possible form of  
communication compares to that.

Such classification system would need to be very sophisticated to  
have any chance of detecing extraterrestrial life
i'd guess, as we can't just naively assume that all they could come  
up with is encrypting things over a primefield using
smallish primes which in our world already only is allowed to be used  
upto secret level.

Regards,
Vincent

> the monkey tests, but I'm not terribly happy with the monkey tests and
> would be perfectly thrilled to have a simpler to compute test that
> revealed precisely this sort of flaw, systematically.  And it doesn't
> hurt at all to have partially or fully redundant tests as long as the
> test themselves are rigorously valid.  If you can find or compute the
> CDF for your test below, I'd be happy to wrap it up and add it to
> dieharder, in other words.  One can always SIMULATE a CDF, of course,
> but that requires a known good generator and sort of begs the question
> if you don't think that e.g. AES or threefish or KISS are good
> generators that would actually pass your test.
>
> Even hardware/quantum sources of random bits are suspect -- they often
> are generated by a process that leaves in the traces of an underlying
> distribution.  I'm not convinced that >>any<< process in the real  
> world
> is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
> description of a closed system is just as deterministic as the  
> classical
> one, and Master equation unpredictability on open subsets of a large
> closed system reflects entropy/ignorance, not actual randomness (hence
> Einstein's famous "doesn't play dice" remark).  But lots of this are
> sufficiently random that one cannot detect any failure of randomness,
> modern crypto class generators being a prime example.
>
>    rgb
>
>>
>> In semi pseudo code, let's take an array of size a billion as an  
>> example,
>> though usually a few million is more than ok:
>>
>> n = 2^30; // 2 to the power 30
>>
>> Function TestNumbersForRandomness(RNG,n) {
>> declare array hashtable[size n];
>>
>> guessednlogn = 2 * (log n / log 2) * n;
>>
>> for( i = 0 ; i < n ; i++ )
>>   hashtable[i] = FALSE;
>>
>> ndraws = filledn = 0;
>> while( ndraws  < guessednlogn ) {
>>    randomnumber = RNG();
>>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>>    if( hashtable[r] == FALSE ) {
>>       hashtable[r] = TRUE;
>>       filledn++;
>>       if( filledn >= n )
>>         break;
>>
>>   }
>>   ndraws++;
>> }
>>
>> if( filledn >= n )
>>    print "With high degree of certainty data generated by a RNG\n");
>>  else
>>    print "Not so sure it's a RNG\n";
>>
>> }
>>
>>
>>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>
>>
>>> -- both unpredictable and
>>> flat/decorrelated at all orders, and even though there aren't really
>>> enough of them for my purposes, I've used them as one of the (small)
>>> "gold standard" sources for testing dieharder even as I test  
>>> them.  For
>>> all practical purposes threefish or aes are truly random as well and
>>> they are a lot faster and easier to use as gold standard generators,
>>> though.
>>> I don't quite understand why the single site restriction is  
>>> important --
>>> this site has been up for years and I don't expect it to go away  
>>> soon;
>>> it is quite reliable.  I don't think there is anything secret  
>>> about how
>>> the numbers are generated, and I'll certify that the numbers it  
>>> produces
>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>> effort on
>>> your part; 6 I don't really understand but the guy who runs the  
>>> site is
>>> clearly willing to construct a custom feed for cash customers, if  
>>> there
>>> is enough value in whatever it is you are trying to do to pay for
>>> access.  If it's just a lottery, well, lord, I can think of a  
>>> dozen ways
>>> to make numbers so random that they'd be unimpeachable for any  
>>> sort of
>>> lottery, both unpredictable and uncorrelated, and they don't any  
>>> of them
>>> require any significant amount of entropy to get started.
>>> I will add one warning -- "randomness" is a rather stringent
>>> mathematical criterion, and is generally tested against the null
>>> hypothesis.  Amateurs who want to make random number generators  
>>> out of
>>> supposedly "random" data streams or fancy algorithms almost  
>>> invariably
>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>> really, really good pseudorandom number generators out there and  
>>> it is
>>> easy to hotwire them together into an xor-based high entropy  
>>> stream that
>>> basically never repeats (feeding it a bit of real entropy now and  
>>> then
>>> as it operates).  I would strongly counsel you against trying to  
>>> take
>>> e.g. weather data and make something "random" out of it.  Unless you
>>> really know what you are doing, you will probably make something  
>>> that
>>> isn't at all random and may not even be unpredictable.  Even most
>>> sources of "quantum" randomness (which is at least possibly "truly
>>> random", although I doubt it) aren't flat, so that they carry the
>>> signature of their generation process unless/until you manage to
>>> transform them into something flat (difficult unless you KNOW the
>>> distribution they are producing).  Pseudorandom number generators  
>>> have
>>> the serious advantage of being amenable to at least some theoretical
>>> analysis (so you can "guarantee" flatness out to some high
>>> dimensionality, say) as well as empirical testing with e.g.  
>>> dieharder.
>>> HTH,
>>>
>>>     rgb
>>>> Thanks,
>>>> David Mathog
>>>> mathog at caltech.edu
>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit  
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Aug 25 20:27:18 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 02:27:18 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108250743280.3079@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
Message-ID: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>


On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:

> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>
>> I noticed that most generated semi-random numbers with software  
>> generators,
>> had the habit to truely adress a search space of n always in O (n  
>> log n).
>>
>> So if you draw from most software RNG's a number and do it modulo n,
>> with n being not too tiny, say quite some millions or even  
>> billions, then every
>> slot in your 'hashtable' will get hit at least once by the RNG,  
>> whereas data
>> in reality simply happens to not have that habit simply.
>>
>> So true random numbers versus generated noise is in this manner easy
>> to distinguish by this. Now i didn't study literature whether some  
>> other chap
>> some long time ago already had invented this. That would be most  
>> interesting
>> to know.
>
> Some other chap named George Marsaglia (and to some extent another  
> chap
> named Donald Knuth) have already invented this.  A number of tests of
> the tails of random number generators are already in dieharder.  All
> "good" modern rngs pass these tests.
>
> The Martingale betting system you are looking at is even older (at  
> least
> Marsaglia and Knuth are still alive).  It dates back to the 18th
> century, and is well known to be flawed for a variety of reasons, not
> the least of which is that gamblers don't have the infinite wealth
> necessary to make this >>even<< a zero-sum strategy and casinos have

 From mathematical viewpoint it makes perfect cash.
As statistica odds is you already have build up considerable profit
when a worst case (that you hit the 10 times practical double limit)
hits you.

The simulations are of course using the practical limit.

Note that the European casino's have a single zero.
In USA there is even more greedy mafia controlling all the casino's,
there are 2 zero's there. 0 and 00.

The simulations were for European casino's.

> betting limits that de facto make it impossible to pursue the  
> requisite
> number of steps and in roulette in particular have 0 and/or 00  
> slots and
> aren't zero-sum to begin with.  You can read a decent analysis of
> outcomes based on the presumed binomial distribution of a zero-sum  
> game
> here:
>
>   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>

You're not allowed to use a system in a casino, so we speak about
theory. Probably first evening they let you try. Second day you'll  
get on the blacklist.

> Your test below is interesting, though.  The only real problems I can
> see with actually using it in dieharder are:
>

Yeah more interesting than the billion times discussed roulette  
system which
has been analyzed completely flat.

>   a) One would need a theoretical estimate of the distribution of
> filling given n log n draws on an n-slotted table (for largish n).   
> That
> is, for a perfect rng, what SHOULD the distribution of success/failure
> be.

As we figured out by now in Artificial Intelligence the statistical
assumptions made in the past they simply do not hold.

For Artificial Intelligence we need a new sort of theoretical theory.

As for the distribution problem, generatiors having a spread that's  
too accurate,
the way to deliver a proof would be for example build a simple device.

Build an old fashioned box where you can draw balls. Remember what  
you coud
see on TV some 20 years ago or so (not sure it was like that in USA).

A big basked with balls. The basket, in fact it's looking like this:

http://www.rateyours.com/blog/uploaded_images/lottery_machine-727064.jpg

But now a much bigger machine like this with inside different means  
of randomizing the balls,
actually also randomly modifying the inside  obstacles of shaking of  
the balls.

After a ball has been drawn you automatically have it annotated and  
the ball immediately goes back
into the machine. For a full minute you have the balls in the machine  
shaken again and you draw
again a ball. It is important to do this randomizing of the balls  
inside the machine for quite some time.
I would propose a minute.

Of course you have to do this with quite some balls.  Say a thousand.

Then you draw balls until all numbers have been drawn at least once.

This cool experiment can be easily build. Of course the expected  
running time of a single experiment
will be a few weeks.

You can produce a number of those drawing machines though and have a  
look.

Theories that seemingly work for small n, n being the number of balls,
are much harder to maintain at bigger n's, as we also see in prime  
number research.

The way how the machine gets designed of course is total crucial. I  
would propose a design that
really shakes the balls really a lot through each other and really  
very thoroughly.

Just like we nowadays know how flawed a big number of card shaking  
machines are that are popular to use.

Such a lottery with realy a lot of balls would be very interesting to  
see the outcomes from.

In fact i would prefer having produced number of those machines, so  
that it's possible to really have a lot of outcomes
and then analyze them very well.

>
>   b) One would then need the CDF for this distribution, to be able to
> turn the results of N trials (of n log n pulls each) into a p-value
> under the null hypothesis -- the probability of obtaining the  
> particular
> number of successes and failures presuming a perfectly random  
> generator.
>
> That way dieharder could apply it rigorously to its 70 or 80 embedded
> rngs or to any user's outboard generator.  There probably is  
> theoretical
> statistical support for the PD and/or CDF -- you're analyzing the  
> tails
> of a poissonian process -- but finding it or doing it yourself (or
> myself), aye, that's the rub.  One cannot just say "high degree of
> certainty that it is an RNG" (by which one means that the rng in
> question fails the test for randomness) in the test.  HOW high?   
> Perfect
> rngs or perfectly random processes will sometimes fill your table, but
> how often?

If we assume that reality of life represents randomness, which is  
another
rather good question in how far that theory is plausible, then using  
that
assumption i'm very sure that the RNG's i investigated so far
have a distribution which is too perfect, more perfect than i have seen
in any reality.

In fact most RNG's fill all slots faster than O ( n log n ), yet it's  
O ( n log n )
that they follow.

This is RNG's that have come through all tests as being a good and
very acceptabe RNG to be used.

Realize i'm no RNG expert, so all the names of all those tests.

For me it's just push button technology. I just designed a test
and found it very odd that all RNG's have such perfect distributions
that they don't even miss a single slot.

I'd argue the only test that would be interesting to me to see how it
might be in reality is the lottery machine test - yet with really a lot
of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical
reasons i would agree with a number of above a 1000.

Paper fiddling is really not interesting to me there to prove anything,
as what i've seen in reality in randomness is total different from how
RNG's model that.

Regards,
Vincent


>   How can you differentiate an "accident" when one does from
> an actual failure?  All of those questions require a more rigorous
> theory and quantitative result embedded in a test that can be
> systematically cranked up to more clearly resolve failures until they
> are unambiguous, not marginal maybe yes maybe no.
>
> I suspect that the failures this test would reveal are already more  
> than
> covered in dieharder, in particular by the bit distribution tests and
> the monkey tests, but I'm not terribly happy with the monkey tests and
> would be perfectly thrilled to have a simpler to compute test that
> revealed precisely this sort of flaw, systematically.  And it doesn't
> hurt at all to have partially or fully redundant tests as long as the
> test themselves are rigorously valid.  If you can find or compute the
> CDF for your test below, I'd be happy to wrap it up and add it to
> dieharder, in other words.  One can always SIMULATE a CDF, of course,
> but that requires a known good generator and sort of begs the question
> if you don't think that e.g. AES or threefish or KISS are good
> generators that would actually pass your test.
>
> Even hardware/quantum sources of random bits are suspect -- they often
> are generated by a process that leaves in the traces of an underlying
> distribution.  I'm not convinced that >>any<< process in the real  
> world
> is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
> description of a closed system is just as deterministic as the  
> classical
> one, and Master equation unpredictability on open subsets of a large
> closed system reflects entropy/ignorance, not actual randomness (hence
> Einstein's famous "doesn't play dice" remark).  But lots of this are
> sufficiently random that one cannot detect any failure of randomness,
> modern crypto class generators being a prime example.
>
>    rgb
>
>>
>> In semi pseudo code, let's take an array of size a billion as an  
>> example,
>> though usually a few million is more than ok:
>>
>> n = 2^30; // 2 to the power 30
>>
>> Function TestNumbersForRandomness(RNG,n) {
>> declare array hashtable[size n];
>>
>> guessednlogn = 2 * (log n / log 2) * n;
>>
>> for( i = 0 ; i < n ; i++ )
>>   hashtable[i] = FALSE;
>>
>> ndraws = filledn = 0;
>> while( ndraws  < guessednlogn ) {
>>    randomnumber = RNG();
>>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>>    if( hashtable[r] == FALSE ) {
>>       hashtable[r] = TRUE;
>>       filledn++;
>>       if( filledn >= n )
>>         break;
>>
>>   }
>>   ndraws++;
>> }
>>
>> if( filledn >= n )
>>    print "With high degree of certainty data generated by a RNG\n");
>>  else
>>    print "Not so sure it's a RNG\n";
>>
>> }
>>
>>
>>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>
>>
>>> -- both unpredictable and
>>> flat/decorrelated at all orders, and even though there aren't really
>>> enough of them for my purposes, I've used them as one of the (small)
>>> "gold standard" sources for testing dieharder even as I test  
>>> them.  For
>>> all practical purposes threefish or aes are truly random as well and
>>> they are a lot faster and easier to use as gold standard generators,
>>> though.
>>> I don't quite understand why the single site restriction is  
>>> important --
>>> this site has been up for years and I don't expect it to go away  
>>> soon;
>>> it is quite reliable.  I don't think there is anything secret  
>>> about how
>>> the numbers are generated, and I'll certify that the numbers it  
>>> produces
>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>> effort on
>>> your part; 6 I don't really understand but the guy who runs the  
>>> site is
>>> clearly willing to construct a custom feed for cash customers, if  
>>> there
>>> is enough value in whatever it is you are trying to do to pay for
>>> access.  If it's just a lottery, well, lord, I can think of a  
>>> dozen ways
>>> to make numbers so random that they'd be unimpeachable for any  
>>> sort of
>>> lottery, both unpredictable and uncorrelated, and they don't any  
>>> of them
>>> require any significant amount of entropy to get started.
>>> I will add one warning -- "randomness" is a rather stringent
>>> mathematical criterion, and is generally tested against the null
>>> hypothesis.  Amateurs who want to make random number generators  
>>> out of
>>> supposedly "random" data streams or fancy algorithms almost  
>>> invariably
>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>> really, really good pseudorandom number generators out there and  
>>> it is
>>> easy to hotwire them together into an xor-based high entropy  
>>> stream that
>>> basically never repeats (feeding it a bit of real entropy now and  
>>> then
>>> as it operates).  I would strongly counsel you against trying to  
>>> take
>>> e.g. weather data and make something "random" out of it.  Unless you
>>> really know what you are doing, you will probably make something  
>>> that
>>> isn't at all random and may not even be unpredictable.  Even most
>>> sources of "quantum" randomness (which is at least possibly "truly
>>> random", although I doubt it) aren't flat, so that they carry the
>>> signature of their generation process unless/until you manage to
>>> transform them into something flat (difficult unless you KNOW the
>>> distribution they are producing).  Pseudorandom number generators  
>>> have
>>> the serious advantage of being amenable to at least some theoretical
>>> analysis (so you can "guarantee" flatness out to some high
>>> dimensionality, say) as well as empirical testing with e.g.  
>>> dieharder.
>>> HTH,
>>>
>>>     rgb
>>>> Thanks,
>>>> David Mathog
>>>> mathog at caltech.edu
>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit  
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 02:07:17 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 02:07:17 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108260127470.3126@lilith>

On Fri, 26 Aug 2011, Vincent Diepeveen wrote:

> If we assume that reality of life represents randomness, which is another
> rather good question in how far that theory is plausible, then using that
> assumption i'm very sure that the RNG's i investigated so far
> have a distribution which is too perfect, more perfect than i have seen
> in any reality.

That's because you live in a different reality than everybody else,
Vincent.

> In fact most RNG's fill all slots faster than O ( n log n ), yet it's O ( n 
> log n )
> that they follow.

In fact, they don't.

> This is RNG's that have come through all tests as being a good and
> very acceptabe RNG to be used.

No, it's not.

> Realize i'm no RNG expert, so all the names of all those tests.
>
> For me it's just push button technology. I just designed a test
> and found it very odd that all RNG's have such perfect distributions
> that they don't even miss a single slot.

It's odd because your test is broken.

>
> I'd argue the only test that would be interesting to me to see how it
> might be in reality is the lottery machine test - yet with really a lot
> of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical
> reasons i would agree with a number of above a 1000.
>
> Paper fiddling is really not interesting to me there to prove anything,
> as what i've seen in reality in randomness is total different from how
> RNG's model that.

Let's try a bit of "paper fiddling".  The expected number of filled slots
is (this is actual code, not pseudocode, for n slots):

  nlogn = log10(n)*n;
  expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn));

The reasoning is enormously simple.  The probability of a slot being
empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 -
1/n)^nlogn.  The probability of a slot being filled is thus 1 - p_e, and
given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within
random noise, n*(1-1/n)^nlogn of them "should" be empty.

Well, I've got a random number generator tester harness, so I hacked
your test into it. One major bug in your code, BTW, is using a modulus
to generate your random numbers -- dunno what that's about, but if your
rng returned numbers between (say) 0 and 7 and you use it to generate
numbers in the range 0 to 5 by means of r%5 then you'll get (for the
sequence of numbers) 0 1 2 3 4 0 1 2.  Note well that you get twice as
many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7.
So you aren't even testing a uniformly distributed sequence of integers.

Fixing this relatively minor bug, removing your breakout and actually
counting up filledn for the full nlogn samples, and applying the test to
mt19937, we get:

rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9990811, expected = 9990881

We run it again:
rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9990802, expected = 9990881

We run it for R250 -- a well-known not-good generator:
rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9990794, expected = 9990881

We run it on the literally infamous randu:
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9999482, expected = 9990881

Note, Vincent, that the last two examples of correctly computed results
from known-terrible generators are much farther from the expected mean
than mt19937, a well-known damn good one.  This suggests that your test
(perhaps unsurprisingly) has some sensitivity, not because some slots
are or aren't empty, but because the NUMBER of slots that are or aren't
empty isn't quite correct.  Note also that in the "paper fiddling"
analysis above, the use of nlogn is quite unimportant -- we could make
this an independent variable and evaluate the table filling for any
value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m.

If I have the energy, I'll see if the distribution of filledn around
expected is e.g. Gaussian -- it seems pretty reasonable that it would be
-- with some expected or empirically computable variance.  If it is,
then this can be fairly easily turned into an actual test that returns a
p-value that humans can use to make rational judgements, or rational
humans can use to make judgements or something like that.  I doubt that
the test will have MUCH sensitivity -- modern generators are way too
good to have their flaws picked out quite this simply, although
Marsaglia's "monkey tests" do something very similar although a lot more
sophisticated mathematically (and arguably more sensitive) and do
suffice to nail randu (anything nails randu) and semi-weak tests like
R250.

Now, let's see what we've learned from this fiddling.  One is that
without it, you just waste a lot of people's time making egregious and
false claims that belittle the tremendously sophisticated and difficult
work a whole lot of "fiddlers" have put into inventing, writing, and
testing modern RNGs.  The truth is that >>all<< RNGs in dieharder "pass"
your test (if the test is "producing at least one zero") once your test
isn't broken.  We've learned that in fact, the best of the modern RNGs
are damn good, and that you could work for five years trying to invent a
test that is good enough to fail any of them and still not succeed.
Finally, we've learned that you should not, not, not take your
Martingale to a casino and try the doubling strategy out to make money,
or if you do put a firm upper bound -- something like 63 Euro -- on what
you're willing to lose with your base stake of 1 Euro.  That way you
have maybe a 40% chance of doubling your 63 Euro before you go broke.
Really, you should read the Wikipedia article I linked, in spite of the
fact that it presents more "paper fiddling".

   Sincerely,

       rgb

(See P.S. comments below...)

>>> n = 2^30; // 2 to the power 30
>>> 
>>> Function TestNumbersForRandomness(RNG,n) {
>>> declare array hashtable[size n];
>>> 
>>> guessednlogn = 2 * (log n / log 2) * n;

Why guess nlogn?  nlog is n*log10(n).  Why nlogn anyway?  Call it m and
make it a parameter.

>>> for( i = 0 ; i < n ; i++ )
>>>  hashtable[i] = FALSE;
>>> 
>>> ndraws = filledn = 0;
>>> while( ndraws  < guessednlogn ) {
>>>   randomnumber = RNG();
>>>   r = randomnumber % n; //     randomnumber =  r  (mod n)

no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX.  Yes
there are roundoff errors, but they are uniform and consistent and as
you can see, don't affect this problem.  What you have isn't even close
to uniform -- it is badly nonrandom.

>>>   if( hashtable[r] == FALSE ) {
>>>      hashtable[r] = TRUE;
>>>      filledn++;

>>>      if( filledn >= n )
>>>        break;

Don't break.  Just count up filledn.  It will never be more than n now
anyway, for any n, or any reasonable m. There probably is some number of
pulls that will raise "expected" to n, but it is pretty big compared to
n, way bigger than nlogn.

>>>
>>>  }
>>>  ndraws++;
>>> }
>>> 
>>> if( filledn >= n )
>>>   print "With high degree of certainty data generated by a RNG\n");
>>> else
>>>   print "Not so sure it's a RNG\n";
>>> 
>>> }

I'm guessing the correct statistic here is something like |expected -
filledn|/expected, but as I said, I haven't really worked at it.  I
haven't decided whether or not it is worth adding this to dieharder --
without a formal derivation of the expected statistic it would be yet
another empirical test, which means you're really comparing one RNG to
another presumed better one, which I don't like.  And do I have time to
do the "fiddling" needed to do a proper derivation?  Aye, that's the
rub...;-)

    rgb

>>> 
>>> 
>>> 
>>> 
>>> 
>>> Regards,
>>> Vincent
>>> 
>>> 
>>> 
>>> 
>>>> -- both unpredictable and
>>>> flat/decorrelated at all orders, and even though there aren't really
>>>> enough of them for my purposes, I've used them as one of the (small)
>>>> "gold standard" sources for testing dieharder even as I test them.  For
>>>> all practical purposes threefish or aes are truly random as well and
>>>> they are a lot faster and easier to use as gold standard generators,
>>>> though.
>>>> I don't quite understand why the single site restriction is important --
>>>> this site has been up for years and I don't expect it to go away soon;
>>>> it is quite reliable.  I don't think there is anything secret about how
>>>> the numbers are generated, and I'll certify that the numbers it produces
>>>> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
>>>> your part; 6 I don't really understand but the guy who runs the site is
>>>> clearly willing to construct a custom feed for cash customers, if there
>>>> is enough value in whatever it is you are trying to do to pay for
>>>> access.  If it's just a lottery, well, lord, I can think of a dozen ways
>>>> to make numbers so random that they'd be unimpeachable for any sort of
>>>> lottery, both unpredictable and uncorrelated, and they don't any of them
>>>> require any significant amount of entropy to get started.
>>>> I will add one warning -- "randomness" is a rather stringent
>>>> mathematical criterion, and is generally tested against the null
>>>> hypothesis.  Amateurs who want to make random number generators out of
>>>> supposedly "random" data streams or fancy algorithms almost invariably
>>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>>> really, really good pseudorandom number generators out there and it is
>>>> easy to hotwire them together into an xor-based high entropy stream that
>>>> basically never repeats (feeding it a bit of real entropy now and then
>>>> as it operates).  I would strongly counsel you against trying to take
>>>> e.g. weather data and make something "random" out of it.  Unless you
>>>> really know what you are doing, you will probably make something that
>>>> isn't at all random and may not even be unpredictable.  Even most
>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>> random", although I doubt it) aren't flat, so that they carry the
>>>> signature of their generation process unless/until you manage to
>>>> transform them into something flat (difficult unless you KNOW the
>>>> distribution they are producing).  Pseudorandom number generators have
>>>> the serious advantage of being amenable to at least some theoretical
>>>> analysis (so you can "guarantee" flatness out to some high
>>>> dimensionality, say) as well as empirical testing with e.g. dieharder.
>>>> HTH,
>>>>
>>>>    rgb
>>>>> Thanks,
>>>>> David Mathog
>>>>> mathog at caltech.edu
>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>>> Duke University Dept. of Physics, Box 90305
>>>> Durham, N.C. 27708-0305
>>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit 
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> 
>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>> Duke University Dept. of Physics, Box 90305
>> Durham, N.C. 27708-0305
>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>> 
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Fri Aug 26 07:56:15 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 13:56:15 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108260127470.3126@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
Message-ID: <B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>


On Aug 26, 2011, at 8:07 AM, Robert G. Brown wrote:

> On Fri, 26 Aug 2011, Vincent Diepeveen wrote:
>
>> If we assume that reality of life represents randomness, which is  
>> another
>> rather good question in how far that theory is plausible, then  
>> using that
>> assumption i'm very sure that the RNG's i investigated so far
>> have a distribution which is too perfect, more perfect than i have  
>> seen
>> in any reality.
>
> That's because you live in a different reality than everybody else,
> Vincent.

Or reality we live in might not be so random as we all guess...

But it's good that you took a look at the die-harder test now - which  
you didn't do before.

>
>> In fact most RNG's fill all slots faster than O ( n log n ), yet  
>> it's O ( n log n )
>> that they follow.
>
> In fact, they don't.
>
>> This is RNG's that have come through all tests as being a good and
>> very acceptabe RNG to be used.
>
> No, it's not.
>
>> Realize i'm no RNG expert, so all the names of all those tests.
>>
>> For me it's just push button technology. I just designed a test
>> and found it very odd that all RNG's have such perfect distributions
>> that they don't even miss a single slot.
>
> It's odd because your test is broken.
>
>>
>> I'd argue the only test that would be interesting to me to see how it
>> might be in reality is the lottery machine test - yet with really  
>> a lot
>> of balls. I'd prefer 10k balls over a 1000 in fact - yet for  
>> practical
>> reasons i would agree with a number of above a 1000.
>>
>> Paper fiddling is really not interesting to me there to prove  
>> anything,
>> as what i've seen in reality in randomness is total different from  
>> how
>> RNG's model that.
>
> Let's try a bit of "paper fiddling".  The expected number of filled  
> slots
> is (this is actual code, not pseudocode, for n slots):
>
>  nlogn = log10(n)*n;
>  expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn));
>
> The reasoning is enormously simple.  The probability of a slot being
> empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 -
> 1/n)^nlogn.  The probability of a slot being filled is thus 1 -  
> p_e, and
> given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within
> random noise, n*(1-1/n)^nlogn of them "should" be empty.
>
> Well, I've got a random number generator tester harness, so I hacked
> your test into it. One major bug in your code, BTW, is using a modulus
> to generate your random numbers -- dunno what that's about, but if  
> your

EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM.

Apologies for the caps. I hope how important this is. You're claiming  
all programmers
use random numbers in a faulty manner?

This is important enough to further discuss about it.

As nearly always you need random numbers from within a given domain  
say 0.. n-1
So projecting a RNG onto that domain is pretty crucial. How would you  
want to do that in a correct manner?

In the slot test in fact a simple AND is enough.

> rng returned numbers between (say) 0 and 7 and you use it to generate
> numbers in the range 0 to 5 by means of r%5 then you'll get (for the
> sequence of numbers) 0 1 2 3 4 0 1 2.  Note well that you get twice as
> many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7.
> So you aren't even testing a uniformly distributed sequence of  
> integers.
>
> Fixing this relatively minor bug, removing your breakout and actually
> counting up filledn for the full nlogn samples, and applying the  
> test to
> mt19937, we get:
>
> rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9990811, expected = 9990881
>
> We run it again:
> rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9990802, expected = 9990881
>
> We run it for R250 -- a well-known not-good generator:
> rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9990794, expected = 9990881
>
> We run it on the literally infamous randu:
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9999482, expected = 9990881
>
> Note, Vincent, that the last two examples of correctly computed  
> results
> from known-terrible generators are much farther from the expected mean
> than mt19937, a well-known damn good one.  This suggests that your  
> test
> (perhaps unsurprisingly) has some sensitivity, not because some slots
> are or aren't empty, but because the NUMBER of slots that are or  
> aren't
> empty isn't quite correct.  Note also that in the "paper fiddling"
> analysis above, the use of nlogn is quite unimportant -- we could make
> this an independent variable and evaluate the table filling for any
> value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m.
>
> If I have the energy, I'll see if the distribution of filledn around
> expected is e.g. Gaussian -- it seems pretty reasonable that it  
> would be
> -- with some expected or empirically computable variance.  If it is,
> then this can be fairly easily turned into an actual test that  
> returns a
> p-value that humans can use to make rational judgements, or rational
> humans can use to make judgements or something like that.  I doubt  
> that
> the test will have MUCH sensitivity -- modern generators are way too
> good to have their flaws picked out quite this simply, although
> Marsaglia's "monkey tests" do something very similar although a lot  
> more
> sophisticated mathematically (and arguably more sensitive) and do
> suffice to nail randu (anything nails randu) and semi-weak tests like
> R250.
>
> Now, let's see what we've learned from this fiddling.  One is that
> without it, you just waste a lot of people's time making egregious and
> false claims that belittle the tremendously sophisticated and  
> difficult
> work a whole lot of "fiddlers" have put into inventing, writing, and
> testing modern RNGs.  The truth is that >>all<< RNGs in dieharder  
> "pass"
> your test (if the test is "producing at least one zero") once your  
> test
> isn't broken.  We've learned that in fact, the best of the modern RNGs
> are damn good, and that you could work for five years trying to  
> invent a
> test that is good enough to fail any of them and still not succeed.
> Finally, we've learned that you should not, not, not take your
> Martingale to a casino and try the doubling strategy out to make  
> money,

It's not interesting to discuss - but yes this strategy makes money  
in casino's,
you just get thrown out of the casino and end up at the blacklist if  
you do.

For good chessplayers all this is not so tough. The casino's  
blacklist of people
too strong in blackjack is endless... ...this is practice for long  
than we live now...

So Casino reality is much simpler. They kick you out if you're good.

That's why they try to popularize poker now - you don't play against  
the casino there.


> or if you do put a firm upper bound -- something like 63 Euro -- on  
> what
> you're willing to lose with your base stake of 1 Euro.  That way you
> have maybe a 40% chance of doubling your 63 Euro before you go broke.
> Really, you should read the Wikipedia article I linked, in spite of  
> the
> fact that it presents more "paper fiddling".
>


>   Sincerely,
>
>       rgb
>
> (See P.S. comments below...)
>
>>>> n = 2^30; // 2 to the power 30
>>>> Function TestNumbersForRandomness(RNG,n) {
>>>> declare array hashtable[size n];
>>>> guessednlogn = 2 * (log n / log 2) * n;
>
> Why guess nlogn?  nlog is n*log10(n).  Why nlogn anyway?  Call it m  
> and
> make it a parameter.
>
>>>> for( i = 0 ; i < n ; i++ )
>>>>  hashtable[i] = FALSE;
>>>> ndraws = filledn = 0;
>>>> while( ndraws  < guessednlogn ) {
>>>>   randomnumber = RNG();
>>>>   r = randomnumber % n; //     randomnumber =  r  (mod n)
>
> no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX.   
> Yes
> there are roundoff errors, but they are uniform and consistent and as
> you can see, don't affect this problem.  What you have isn't even  
> close
> to uniform -- it is badly nonrandom.
>
>>>>   if( hashtable[r] == FALSE ) {
>>>>      hashtable[r] = TRUE;
>>>>      filledn++;
>
>>>>      if( filledn >= n )
>>>>        break;
>
> Don't break.  Just count up filledn.  It will never be more than n now
> anyway, for any n, or any reasonable m. There probably is some  
> number of
> pulls that will raise "expected" to n, but it is pretty big  
> compared to
> n, way bigger than nlogn.
>
>>>>
>>>>  }
>>>>  ndraws++;
>>>> }
>>>> if( filledn >= n )
>>>>   print "With high degree of certainty data generated by a RNG\n");
>>>> else
>>>>   print "Not so sure it's a RNG\n";
>>>> }
>
> I'm guessing the correct statistic here is something like |expected -
> filledn|/expected, but as I said, I haven't really worked at it.  I
> haven't decided whether or not it is worth adding this to dieharder --
> without a formal derivation of the expected statistic it would be yet
> another empirical test, which means you're really comparing one RNG to
> another presumed better one, which I don't like.  And do I have  
> time to
> do the "fiddling" needed to do a proper derivation?  Aye, that's the
> rub...;-)
>
>    rgb
>
>>>> Regards,
>>>> Vincent
>>>>> -- both unpredictable and
>>>>> flat/decorrelated at all orders, and even though there aren't  
>>>>> really
>>>>> enough of them for my purposes, I've used them as one of the  
>>>>> (small)
>>>>> "gold standard" sources for testing dieharder even as I test  
>>>>> them.  For
>>>>> all practical purposes threefish or aes are truly random as  
>>>>> well and
>>>>> they are a lot faster and easier to use as gold standard  
>>>>> generators,
>>>>> though.
>>>>> I don't quite understand why the single site restriction is  
>>>>> important --
>>>>> this site has been up for years and I don't expect it to go  
>>>>> away soon;
>>>>> it is quite reliable.  I don't think there is anything secret  
>>>>> about how
>>>>> the numbers are generated, and I'll certify that the numbers it  
>>>>> produces
>>>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>>>> effort on
>>>>> your part; 6 I don't really understand but the guy who runs the  
>>>>> site is
>>>>> clearly willing to construct a custom feed for cash customers,  
>>>>> if there
>>>>> is enough value in whatever it is you are trying to do to pay for
>>>>> access.  If it's just a lottery, well, lord, I can think of a  
>>>>> dozen ways
>>>>> to make numbers so random that they'd be unimpeachable for any  
>>>>> sort of
>>>>> lottery, both unpredictable and uncorrelated, and they don't  
>>>>> any of them
>>>>> require any significant amount of entropy to get started.
>>>>> I will add one warning -- "randomness" is a rather stringent
>>>>> mathematical criterion, and is generally tested against the null
>>>>> hypothesis.  Amateurs who want to make random number generators  
>>>>> out of
>>>>> supposedly "random" data streams or fancy algorithms almost  
>>>>> invariably
>>>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>>>> really, really good pseudorandom number generators out there  
>>>>> and it is
>>>>> easy to hotwire them together into an xor-based high entropy  
>>>>> stream that
>>>>> basically never repeats (feeding it a bit of real entropy now  
>>>>> and then
>>>>> as it operates).  I would strongly counsel you against trying  
>>>>> to take
>>>>> e.g. weather data and make something "random" out of it.   
>>>>> Unless you
>>>>> really know what you are doing, you will probably make  
>>>>> something that
>>>>> isn't at all random and may not even be unpredictable.  Even most
>>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>>> random", although I doubt it) aren't flat, so that they carry the
>>>>> signature of their generation process unless/until you manage to
>>>>> transform them into something flat (difficult unless you KNOW the
>>>>> distribution they are producing).  Pseudorandom number  
>>>>> generators have
>>>>> the serious advantage of being amenable to at least some  
>>>>> theoretical
>>>>> analysis (so you can "guarantee" flatness out to some high
>>>>> dimensionality, say) as well as empirical testing with e.g.  
>>>>> dieharder.
>>>>> HTH,
>>>>>
>>>>>    rgb
>>>>>> Thanks,
>>>>>> David Mathog
>>>>>> mathog at caltech.edu
>>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>>> Robert G. Brown	                       http://www.phy.duke.edu/ 
>>>>> ~rgb/
>>>>> Duke University Dept. of Physics, Box 90305
>>>>> Durham, N.C. 27708-0305
>>>>> Phone: 1-919-660-2567  Fax: 919-660-2525      
>>>>> email:rgb at phy.duke.edu
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>>>> Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit  
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 08:29:06 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 08:29:06 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108260127470.3126@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
Message-ID: <alpine.LFD.2.02.1108260740380.12275@lilith>

On Fri, 26 Aug 2011, Robert G. Brown wrote:

> Let's try a bit of "paper fiddling".  The expected number of filled slots
> is (this is actual code, not pseudocode, for n slots):
>
>  nlogn = log10(n)*n;
>  expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn));
>
> The reasoning is enormously simple.  The probability of a slot being
> empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 -
> 1/n)^nlogn.  The probability of a slot being filled is thus 1 - p_e, and
> given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within
> random noise, n*(1-1/n)^nlogn of them "should" be empty.

Silly me.  All of the anonymous slots are at least asymptotically
independent (not necessarily obvious, but true from symmetry, I think,
subject to the weak constraint that the total population of all of the
slots has to add up to the number of trials so there are probably n-1
degrees of freedom in the Pearson test).  We have p and q.  The
distribution is binomial and of course I know the binomial distribution
and its sigma.  I can easily build any one of several tests on top of
this (simple binomial or even multinomial, since I effectively have the
hit frequency for n slots and it should BE the binomial distribution),
and in fact have two or three already that are very similar to this on a
smaller scale.

It's what comes of hacking out -- sorry, "fiddling" out -- quick
solutions and tests late at night when you're tired and ought to be
sleeping.  A bit of coffee makes a world of difference...:-)

I'll have to think a bit about it and make sure that this isn't already
done, better, in e.g. STS, but it might yet see the light of day as an
actual dieharder test.

BTW, I'm not replying to your space alien ET post (to the Beowulf list
in reply to an already OT discussion of martingales that arose out of a
discussion of good RNGs and seeding strategies sorry y'all but hey, at
least it is entertaining?) simply because my jaw is sore from hitting
the ground so many times while reading it.  Those are some top-quality
hallucinogens, yes they are...

We will now return to your regularly scheduled discussion of boring
things like bandwidth, memory reliability, parallel algorithms and the
like, you know, on-topic stuff.  But if any of y'all ever need to test
rngs or flame schemes to "win" non-zero-sum games by means of
"strategy", you know who to call...;-) Somewhere upstairs I have this
nifty book on game theory and in a pinch I can even trot out an actual
game matrix and analyze outcomes algefiddlingbraically!

    rgb

P.S. -- Vincent, all of these simple problems were solved by
mathematicians and statisticians so very, very, long ago, beginning with
the work of Pascal and Fermat (there are names to conjure with, eh?)
solving the problem posed by the Chevalier de Mere regarding an even bet
on double sixes happening at least once in 24 throws: actual probability
of double sixes per throw are (of course) 1/36, probability of no double
six in 24 throws are (35/36)^24, odds of at least one are therefore 1 -
(35/36)^24 = 0.4914038761 -- all paper fiddling, mind you -- a result
that is eerily reminiscent of the solution to your problem, but with
fewer slots.  So at even odds it is -- barely -- a sucker's bet.  But a
margin of 0.86% is enough to empty even the deepest pockets, over time.

Now all you have to do is advance your actual knowledge of statistics
beyond that realized by an idly rich French nobleman in 1654 (who still
was wise enough to recognize that it wasn't an even bet and consulted
the best of the best of the minds of his day to prove it).

You have a mere 357 years to go...:-)

P.P.S -- If "all rngs" were really as bad as you assert, does it not
stand to reason that "all Monte Carlo computations" that use them would
all get egregiously incorrect results?  And yet they don't.  In fact, in
problems (like the Ising model in 2D) where known solutions exist, they
agree basically perfectly with the theoretical solution, and of course
it is easy to compare a wide range of integrals and Markov process
outcomes with theory.  So if you used your simple common sense you would
construct a mental argument like:

"Either

I, in my brilliance, have discovered an egregious flaw in all random
number generators used by all of those STUPID computer scientists,
mathematicians, and physicists for decades to do their long and complex
computations that no doubt all got equally egregiously wrong answers;

Or

Those computer scientists, mathematicians, and physicists are actually
pretty smart and aggressively check their work (and each other's work)
with a strong incentive to discover problems.  It is rather probable
that any such egregious error would have been long ago discovered;
therefore there is almost certainly a serious error in my own
reasoning."

Seriously, dude.  Ask yourself "Am I really smarter and better informed
than Pascal, Fermat, Laplace, Bayes, not to mention all of those
contemporary humans who have been devoting entire well-educated careers
to random numbers as if all of modern e-commerce depended on them (it
does) or is it just barely possible that I've made a mistake?"  Come on,
you can do it.  I know it is difficult for you, but try..

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 08:57:55 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 08:57:55 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
	<B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108260829390.12275@lilith>

On Fri, 26 Aug 2011, Vincent Diepeveen wrote:

> EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM.

Bullshit.  "Every programmer" isn't dumb as a post.  Or wasn't my
argument clear enough?  Do you need me to actually post the code for how
the GSL -- written by at least some of these programmers -- do this?
Here, I'll try again.  This time I'll use smaller numbers and make an
actual table of the outcomes:  Imagine only two lousy random bits,
enough to make 00, 01, 10, 11 (or 0,1,2,3).  Here is the probability
table:

r =   0    1    2    3
------------------------
p = 0.25 0.25 0.25 0.25

Let us generate N samples from this distribution.  Our expected
frequency of the occurrence of all of these numbers is:

r =     0      1      2      3
---------------------------------
Np = 0.25*N 0.25*N 0.25*N 0.25*N

Is this all clear?  If I generate 100 random numbers, the expected
number of 3's is 0.25*100 = 25.  Now apply mod 3  the outcomes are now:

r =     0      1      2      3
r%3 =   0      1      2      0
---------------------------------
Np = 0.25*N 0.25*N 0.25*N 0.25*N

You now sum the number of outcomes for each element in the mod 3 table,
since we have two values of r that make one value of r%3 and frequency
clearly aggregates as the outcomes are independent.

r%3 =   0      1      2
---------------------------------
Np = 0.50*N 0.25*N 0.25*N 0.25*N

It is therefore twice as likely that two random bits, modulus 3, will
produce a zero.

>
> Apologies for the caps. I hope how important this is. You're claiming all 
> programmers
> use random numbers in a faulty manner?

They don't.  Only you do.  Everybody else takes a uniform deviate and
scales it by the number of desired integer outcomes, checking to make
sure that they don't go out of bounds and thereby e.g. get an incorrect
endpoint frequency.  The gsl code is open source and it takes two
minutes to download it and check (I just timed it).  Go on, look.  the
file is rng/rng.c in the gsl distro directory, the function name is
gsl_rng_uniform_int.  No modulus.

The exception is (obviously) when the range is a power of 2.  In that
case ONLY, r%n where r is a binary uint and n is a power of 2 will
(obviously) equally balance the table above.  Personally I'd use >> and
shift the bits because it is faster than mod, but suit yourself, after
you've learned what you are doing.

>
> This is important enough to further discuss about it.
>
> As nearly always you need random numbers from within a given domain say 0.. 
> n-1
> So projecting a RNG onto that domain is pretty crucial. How would you want to 
> do that in a correct manner?
>
> In the slot test in fact a simple AND is enough.

No, as I've just proven algebraically.  The correct manner for general n
is the gsl code, but in rough terms it is n*r/r_max (with care used to
avoid roundoff errors at the ends as noted).  If you've been using
modulus, all your results are crap.

Look, the reason God invented the GSL and made it open source is so
numb-nuts and smart people alike wouldn't have to constantly reinvent
the wheel, badly.  Use it.  Don't question it -- you obviously aren't
competent to.  Just use it.  If you want a random integer from 0 to n,
use gsl_rn_uniform_int.  If you want this for e.g. mt19937 don't write
the latter, set up the gsl to use it to generate your ints.  Learn to
use it carefully, use it correctly, but use it.

> It's not interesting to discuss - but yes this strategy makes money in 
> casino's,
> you just get thrown out of the casino and end up at the blacklist if you do.

You are clearly too stupid to be allowed out of the house without a
caretaker.  I'm not going to walk you through the proof that this isn't
so as it is openly published and I've already referenced a step my step
analysis that you can't be bothered, apparently, to actually read.  I'll
just reiterate the previous offer -- I, too, am happy to buy a roulette
wheel and you can come over and bet Martingale against me all day.  Just
one 0, no limits and no quitting, infinite credit on both sides, we play
until it is obvious to you that you are losing, have lost, will always
lose, and the longer you play the more that you will lose.  Loser buys
the winner a case of truly excellent beer.

Look, why don't you fix your random number code and try again, since
your simulations are obviously trash.  It isn't difficult to show this
with simulations, once you actually code them correctly, but I have to
go and don't have time to do it for you.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Fri Aug 26 12:53:14 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 18:53:14 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108260829390.12275@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
	<B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
	<alpine.LFD.2.02.1108260829390.12275@lilith>
Message-ID: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl>


On Aug 26, 2011, at 2:57 PM, Robert G. Brown wrote:

> On Fri, 26 Aug 2011, Vincent Diepeveen wrote:
>
>> EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR  
>> PROGRAM.
>
> Bullshit.  "Every programmer" isn't dumb as a post.  Or wasn't my
> argument clear enough?  Do you need me to actually post the code  
> for how
> the GSL -- written by at least some of these programmers -- do this?
> Here, I'll try again.  This time I'll use smaller numbers and make an
> actual table of the outcomes:  Imagine only two lousy random bits,
> enough to make 00, 01, 10, 11 (or 0,1,2,3).  Here is the probability
> table:
>
> r =   0    1    2    3
> ------------------------
> p = 0.25 0.25 0.25 0.25
>
> Let us generate N samples from this distribution.  Our expected
> frequency of the occurrence of all of these numbers is:
>
> r =     0      1      2      3
> ---------------------------------
> Np = 0.25*N 0.25*N 0.25*N 0.25*N
>
> Is this all clear?  If I generate 100 random numbers, the expected
> number of 3's is 0.25*100 = 25.  Now apply mod 3  the outcomes are  
> now:
>
> r =     0      1      2      3
> r%3 =   0      1      2      0
> ---------------------------------
> Np = 0.25*N 0.25*N 0.25*N 0.25*N
>
> You now sum the number of outcomes for each element in the mod 3  
> table,
> since we have two values of r that make one value of r%3 and frequency
> clearly aggregates as the outcomes are independent.
>
> r%3 =   0      1      2
> ---------------------------------
> Np = 0.50*N 0.25*N 0.25*N 0.25*N
>
> It is therefore twice as likely that two random bits, modulus 3, will
> produce a zero.
>

If you have a domain of 0..3 where a generator generates and your  
modulo n is
just n-1, obviously that means it'll map a tad more to 0.

Basically the deviation one would be able to measure in such case is  
that if
we have a generator that runs over a field of say size m and we want  
to map
that onto n entries then we have the next formula :

m = x * n + y;

Now your theory is basically if i summarize it that in such case the  
entries
0..y-1 will have a tad higher hit than y.. m-1.

However if x is large enough that shouldn't be a big problem.

If we map now in the test i'm doing onto say a few million to a  
billion entries,
the size of that x is a number of 40+ bits for most RNG's.

So that means that the deviation of the effect you show above the  
order of
magnitued of 1 /  2^40 in such case, which is rather small.

Especially because the 'test' if you want to call it like that, is  
operating in the
granularity O ( log n ), we can fully ignore then the expected  
deviation granularity O ( 2 ^ 40 ).


>>
>> Apologies for the caps. I hope how important this is. You're  
>> claiming all programmers
>> use random numbers in a faulty manner?
>
> They don't.  Only you do.  Everybody else takes a uniform deviate and
> scales it by the number of desired integer outcomes, checking to make
> sure that they don't go out of bounds and thereby e.g. get an  
> incorrect
> endpoint frequency.  The gsl code is open source and it takes two
> minutes to download it and check (I just timed it).  Go on, look.  the
> file is rng/rng.c in the gsl distro directory, the function name is
> gsl_rng_uniform_int.  No modulus.
>
> The exception is (obviously) when the range is a power of 2.  In that
> case ONLY, r%n where r is a binary uint and n is a power of 2 will
> (obviously) equally balance the table above.  Personally I'd use >>  
> and
> shift the bits because it is faster than mod, but suit yourself, after
> you've learned what you are doing.
>
>>
>> This is important enough to further discuss about it.
>>
>> As nearly always you need random numbers from within a given  
>> domain say 0.. n-1
>> So projecting a RNG onto that domain is pretty crucial. How would  
>> you want to do that in a correct manner?
>>
>> In the slot test in fact a simple AND is enough.
>
> No, as I've just proven algebraically.  The correct manner for  
> general n
> is the gsl code, but in rough terms it is n*r/r_max (with care used to
> avoid roundoff errors at the ends as noted).  If you've been using
> modulus, all your results are crap.
>
> Look, the reason God invented the GSL and made it open source is so
> numb-nuts and smart people alike wouldn't have to constantly reinvent
> the wheel, badly.  Use it.  Don't question it -- you obviously aren't
> competent to.  Just use it.  If you want a random integer from 0 to n,
> use gsl_rn_uniform_int.  If you want this for e.g. mt19937 don't write
> the latter, set up the gsl to use it to generate your ints.  Learn to
> use it carefully, use it correctly, but use it.
>
>> It's not interesting to discuss - but yes this strategy makes  
>> money in casino's,
>> you just get thrown out of the casino and end up at the blacklist  
>> if you do.
>
> You are clearly too stupid to be allowed out of the house without a
> caretaker.  I'm not going to walk you through the proof that this  
> isn't
> so as it is openly published and I've already referenced a step my  
> step
> analysis that you can't be bothered, apparently, to actually read.   
> I'll
> just reiterate the previous offer -- I, too, am happy to buy a  
> roulette
> wheel and you can come over and bet Martingale against me all day.   
> Just
> one 0, no limits and no quitting, infinite credit on both sides, we  
> play
> until it is obvious to you that you are losing, have lost, will always
> lose, and the longer you play the more that you will lose.  Loser buys
> the winner a case of truly excellent beer.
>
> Look, why don't you fix your random number code and try again, since
> your simulations are obviously trash.  It isn't difficult to show this
> with simulations, once you actually code them correctly, but I have to
> go and don't have time to do it for you.
>
>    rgb
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Fri Aug 26 14:17:46 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 20:17:46 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com>
Message-ID: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl>


On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote:

> I hate to troll, but...
>
> On Aug 25, 2011, at 8:27 PM, Vincent Diepeveen <diep at xs4all.nl> wrote:
>
>>
>> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:
>>
>>> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>>>
>>>> I noticed that most generated semi-random numbers with software
>>>> generators,
>>>> had the habit to truely adress a search space of n always in O (n
>>>> log n).
>>>>
>>>> So if you draw from most software RNG's a number and do it  
>>>> modulo n,
>>>> with n being not too tiny, say quite some millions or even
>>>> billions, then every
>>>> slot in your 'hashtable' will get hit at least once by the RNG,
>>>> whereas data
>>>> in reality simply happens to not have that habit simply.
>>>>
>>>> So true random numbers versus generated noise is in this manner  
>>>> easy
>>>> to distinguish by this. Now i didn't study literature whether some
>>>> other chap
>>>> some long time ago already had invented this. That would be most
>>>> interesting
>>>> to know.
>>>
>>> Some other chap named George Marsaglia (and to some extent another
>>> chap
>>> named Donald Knuth) have already invented this.  A number of  
>>> tests of
>>> the tails of random number generators are already in dieharder.  All
>>> "good" modern rngs pass these tests.
>>>
>>> The Martingale betting system you are looking at is even older (at
>>> least
>>> Marsaglia and Knuth are still alive).  It dates back to the 18th
>>> century, and is well known to be flawed for a variety of reasons,  
>>> not
>>> the least of which is that gamblers don't have the infinite wealth
>>> necessary to make this >>even<< a zero-sum strategy and casinos have
>>
>> From mathematical viewpoint it makes perfect cash.
>> As statistica odds is you already have build up considerable profit
>> when a worst case (that you hit the 10 times practical double limit)
>> hits you.
>
> A betting system will not improve the negative mathematical  
> expectation of a  casino game.

the doubling system doesn't have a negative expectation.

You are allowed to double 10 times practical if you start with 1.

Of all systems in roulette this is the only system that will produce  
a profit, just theoretical spoken,
practice we all agree. they kick you out.

>   If your mathematical expectation is -1 for each trial, it's -10  
> for ten trials.  You will not win in the long-run using Martingale.
>

Except that this system doesn't have a negative expectation. it has a  
positive expectation.

There is no other system in roulette that has a positive expectation,  
other than the doubling system.

Please use European Casino model. I don't live in the USA.


>>
>> The simulations are of course using the practical limit.
>>
>> Note that the European casino's have a single zero.
>> In USA there is even more greedy mafia controlling all the casino's,
>> there are 2 zero's there. 0 and 00.
>>
>> The simulations were for European casino's.
>>
>>> betting limits that de facto make it impossible to pursue the
>>> requisite
>>> number of steps and in roulette in particular have 0 and/or 00
>>> slots and
>>> aren't zero-sum to begin with.  You can read a decent analysis of
>>> outcomes based on the presumed binomial distribution of a zero-sum
>>> game
>>> here:
>>>
>>>  http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>>>
>>
>> You're not allowed to use a system in a casino, so we speak about
>> theory. Probably first evening they let you try. Second day you'll
>> get on the blacklist.
>
> Nonsense.  Have you ever been to a casino?


> You are welcome to Martingale all day long at any of them.

> Hell, I'll buy a roulette wheel and you can come over to my place  
> if you play this strategy or any if its variants.

>   The casino wants you to Martingale -- it's favorable to them.   
> Why would they stop a loser?

The doubling system in all casino's if you'd apply to it in an  
objective manner and would be allowed to - it makes a profit.

Same for some slot machines over there. After some others played on  
it and it swallowed money - then majority of slot machines
are not negative sum games anymore. If you play on them then, it's a  
positive sum game.

If it would be always negative sum games then no lady would keep  
playing slot machines.

>
> The casino is not concerned with betting strategies.  It is  
> concerned with folks gaining an edge.  A betting system alone will  
> not give the player an edge.
>

No very wrong, a casino is interested in maximizing its profit.  
Kicking out folks that do well is part of that game.

Oh by the way - I worked for a casino. Did you?


>>
>>> Your test below is interesting, though.  The only real problems I  
>>> can
>>> see with actually using it in dieharder are:
>>>
>>
>> Yeah more interesting than the billion times discussed roulette
>> system which
>> has been analyzed completely flat.
>>
>>>  a) One would need a theoretical estimate of the distribution of
>>> filling given n log n draws on an n-slotted table (for largish n).
>>> That
>>> is, for a perfect rng, what SHOULD the distribution of success/ 
>>> failure
>>> be.
>>
>> As we figured out by now in Artificial Intelligence the statistical
>> assumptions made in the past they simply do not hold.
>>
>> For Artificial Intelligence we need a new sort of theoretical theory.
>>
>> As for the distribution problem, generatiors having a spread that's
>> too accurate,
>> the way to deliver a proof would be for example build a simple  
>> device.
>>
>> Build an old fashioned box where you can draw balls. Remember what
>> you coud
>> see on TV some 20 years ago or so (not sure it was like that in USA).
>>
>> A big basked with balls. The basket, in fact it's looking like this:
>>
>> http://www.rateyours.com/blog/uploaded_images/ 
>> lottery_machine-727064.jpg
>>
>> But now a much bigger machine like this with inside different means
>> of randomizing the balls,
>> actually also randomly modifying the inside  obstacles of shaking of
>> the balls.
>>
>> After a ball has been drawn you automatically have it annotated and
>> the ball immediately goes back
>> into the machine. For a full minute you have the balls in the machine
>> shaken again and you draw
>> again a ball. It is important to do this randomizing of the balls
>> inside the machine for quite some time.
>> I would propose a minute.
>>
>> Of course you have to do this with quite some balls.  Say a thousand.
>>
>> Then you draw balls until all numbers have been drawn at least once.
>>
>> This cool experiment can be easily build. Of course the expected
>> running time of a single experiment
>> will be a few weeks.
>>
>> You can produce a number of those drawing machines though and have a
>> look.
>>
>> Theories that seemingly work for small n, n being the number of  
>> balls,
>> are much harder to maintain at bigger n's, as we also see in prime
>> number research.
>>
>> The way how the machine gets designed of course is total crucial. I
>> would propose a design that
>> really shakes the balls really a lot through each other and really
>> very thoroughly.
>>
>> Just like we nowadays know how flawed a big number of card shaking
>> machines are that are popular to use.
>>
>> Such a lottery with realy a lot of balls would be very interesting to
>> see the outcomes from.
>>
>> In fact i would prefer having produced number of those machines, so
>> that it's possible to really have a lot of outcomes
>> and then analyze them very well.
>>
>>>
>>>  b) One would then need the CDF for this distribution, to be able to
>>> turn the results of N trials (of n log n pulls each) into a p-value
>>> under the null hypothesis -- the probability of obtaining the
>>> particular
>>> number of successes and failures presuming a perfectly random
>>> generator.
>>>
>>> That way dieharder could apply it rigorously to its 70 or 80  
>>> embedded
>>> rngs or to any user's outboard generator.  There probably is
>>> theoretical
>>> statistical support for the PD and/or CDF -- you're analyzing the
>>> tails
>>> of a poissonian process -- but finding it or doing it yourself (or
>>> myself), aye, that's the rub.  One cannot just say "high degree of
>>> certainty that it is an RNG" (by which one means that the rng in
>>> question fails the test for randomness) in the test.  HOW high?
>>> Perfect
>>> rngs or perfectly random processes will sometimes fill your  
>>> table, but
>>> how often?
>>
>> If we assume that reality of life represents randomness, which is
>> another
>> rather good question in how far that theory is plausible, then using
>> that
>> assumption i'm very sure that the RNG's i investigated so far
>> have a distribution which is too perfect, more perfect than i have  
>> seen
>> in any reality.
>>
>> In fact most RNG's fill all slots faster than O ( n log n ), yet it's
>> O ( n log n )
>> that they follow.
>>
>> This is RNG's that have come through all tests as being a good and
>> very acceptabe RNG to be used.
>>
>> Realize i'm no RNG expert, so all the names of all those tests.
>>
>> For me it's just push button technology. I just designed a test
>> and found it very odd that all RNG's have such perfect distributions
>> that they don't even miss a single slot.
>>
>> I'd argue the only test that would be interesting to me to see how it
>> might be in reality is the lottery machine test - yet with really  
>> a lot
>> of balls. I'd prefer 10k balls over a 1000 in fact - yet for  
>> practical
>> reasons i would agree with a number of above a 1000.
>>
>> Paper fiddling is really not interesting to me there to prove  
>> anything,
>> as what i've seen in reality in randomness is total different from  
>> how
>> RNG's model that.
>>
>> Regards,
>> Vincent
>>
>>
>>>  How can you differentiate an "accident" when one does from
>>> an actual failure?  All of those questions require a more rigorous
>>> theory and quantitative result embedded in a test that can be
>>> systematically cranked up to more clearly resolve failures until  
>>> they
>>> are unambiguous, not marginal maybe yes maybe no.
>>>
>>> I suspect that the failures this test would reveal are already more
>>> than
>>> covered in dieharder, in particular by the bit distribution tests  
>>> and
>>> the monkey tests, but I'm not terribly happy with the monkey  
>>> tests and
>>> would be perfectly thrilled to have a simpler to compute test that
>>> revealed precisely this sort of flaw, systematically.  And it  
>>> doesn't
>>> hurt at all to have partially or fully redundant tests as long as  
>>> the
>>> test themselves are rigorously valid.  If you can find or compute  
>>> the
>>> CDF for your test below, I'd be happy to wrap it up and add it to
>>> dieharder, in other words.  One can always SIMULATE a CDF, of  
>>> course,
>>> but that requires a known good generator and sort of begs the  
>>> question
>>> if you don't think that e.g. AES or threefish or KISS are good
>>> generators that would actually pass your test.
>>>
>>> Even hardware/quantum sources of random bits are suspect -- they  
>>> often
>>> are generated by a process that leaves in the traces of an  
>>> underlying
>>> distribution.  I'm not convinced that >>any<< process in the real
>>> world
>>> is >>truly<< random.  Physics is ambiguous on the issue -- the  
>>> quantum
>>> description of a closed system is just as deterministic as the
>>> classical
>>> one, and Master equation unpredictability on open subsets of a large
>>> closed system reflects entropy/ignorance, not actual randomness  
>>> (hence
>>> Einstein's famous "doesn't play dice" remark).  But lots of this are
>>> sufficiently random that one cannot detect any failure of  
>>> randomness,
>>> modern crypto class generators being a prime example.
>>>
>>>   rgb
>>>
>>>>
>>>> In semi pseudo code, let's take an array of size a billion as an
>>>> example,
>>>> though usually a few million is more than ok:
>>>>
>>>> n = 2^30; // 2 to the power 30
>>>>
>>>> Function TestNumbersForRandomness(RNG,n) {
>>>> declare array hashtable[size n];
>>>>
>>>> guessednlogn = 2 * (log n / log 2) * n;
>>>>
>>>> for( i = 0 ; i < n ; i++ )
>>>>  hashtable[i] = FALSE;
>>>>
>>>> ndraws = filledn = 0;
>>>> while( ndraws  < guessednlogn ) {
>>>>   randomnumber = RNG();
>>>>   r = randomnumber % n; //     randomnumber =  r  (mod n)
>>>>   if( hashtable[r] == FALSE ) {
>>>>      hashtable[r] = TRUE;
>>>>      filledn++;
>>>>      if( filledn >= n )
>>>>        break;
>>>>
>>>>  }
>>>>  ndraws++;
>>>> }
>>>>
>>>> if( filledn >= n )
>>>>   print "With high degree of certainty data generated by a RNG\n");
>>>> else
>>>>   print "Not so sure it's a RNG\n";
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Vincent
>>>>
>>>>
>>>>
>>>>
>>>>> -- both unpredictable and
>>>>> flat/decorrelated at all orders, and even though there aren't  
>>>>> really
>>>>> enough of them for my purposes, I've used them as one of the  
>>>>> (small)
>>>>> "gold standard" sources for testing dieharder even as I test
>>>>> them.  For
>>>>> all practical purposes threefish or aes are truly random as  
>>>>> well and
>>>>> they are a lot faster and easier to use as gold standard  
>>>>> generators,
>>>>> though.
>>>>> I don't quite understand why the single site restriction is
>>>>> important --
>>>>> this site has been up for years and I don't expect it to go away
>>>>> soon;
>>>>> it is quite reliable.  I don't think there is anything secret
>>>>> about how
>>>>> the numbers are generated, and I'll certify that the numbers it
>>>>> produces
>>>>> don't make dieharder unhappy.  So 1 is fixable with a bit of
>>>>> effort on
>>>>> your part; 6 I don't really understand but the guy who runs the
>>>>> site is
>>>>> clearly willing to construct a custom feed for cash customers, if
>>>>> there
>>>>> is enough value in whatever it is you are trying to do to pay for
>>>>> access.  If it's just a lottery, well, lord, I can think of a
>>>>> dozen ways
>>>>> to make numbers so random that they'd be unimpeachable for any
>>>>> sort of
>>>>> lottery, both unpredictable and uncorrelated, and they don't any
>>>>> of them
>>>>> require any significant amount of entropy to get started.
>>>>> I will add one warning -- "randomness" is a rather stringent
>>>>> mathematical criterion, and is generally tested against the null
>>>>> hypothesis.  Amateurs who want to make random number generators
>>>>> out of
>>>>> supposedly "random" data streams or fancy algorithms almost
>>>>> invariably
>>>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>>>> really, really good pseudorandom number generators out there and
>>>>> it is
>>>>> easy to hotwire them together into an xor-based high entropy
>>>>> stream that
>>>>> basically never repeats (feeding it a bit of real entropy now and
>>>>> then
>>>>> as it operates).  I would strongly counsel you against trying to
>>>>> take
>>>>> e.g. weather data and make something "random" out of it.   
>>>>> Unless you
>>>>> really know what you are doing, you will probably make something
>>>>> that
>>>>> isn't at all random and may not even be unpredictable.  Even most
>>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>>> random", although I doubt it) aren't flat, so that they carry the
>>>>> signature of their generation process unless/until you manage to
>>>>> transform them into something flat (difficult unless you KNOW the
>>>>> distribution they are producing).  Pseudorandom number generators
>>>>> have
>>>>> the serious advantage of being amenable to at least some  
>>>>> theoretical
>>>>> analysis (so you can "guarantee" flatness out to some high
>>>>> dimensionality, say) as well as empirical testing with e.g.
>>>>> dieharder.
>>>>> HTH,
>>>>>
>>>>>    rgb
>>>>>> Thanks,
>>>>>> David Mathog
>>>>>> mathog at caltech.edu
>>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>>> Robert G. Brown                           http:// 
>>>>> www.phy.duke.edu/~rgb/
>>>>> Duke University Dept. of Physics, Box 90305
>>>>> Durham, N.C. 27708-0305
>>>>> Phone: 1-919-660-2567  Fax: 919-660-2525      
>>>>> email:rgb at phy.duke.edu
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>>> Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> Robert G. Brown                           http://www.phy.duke.edu/ 
>>> ~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>>
>>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 17:46:30 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 17:46:30 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
	<B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
	<alpine.LFD.2.02.1108260829390.12275@lilith>
	<698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108261335130.12325@lilithnew>

On Fri, 26 Aug 2011, Vincent Diepeveen wrote:

> If you have a domain of 0..3 where a generator generates and your modulo n is
> just n-1, obviously that means it'll map a tad more to 0.
>
> Basically the deviation one would be able to measure in such case is that if
> we have a generator that runs over a field of say size m and we want to map
> that onto n entries then we have the next formula :
>
> m = x * n + y;
>
> Now your theory is basically if i summarize it that in such case the entries
> 0..y-1 will have a tad higher hit than y.. m-1.

What's a "tad" when you're measuring the quality of an RNG?  I'm just
curious.  Could you be more specific?  Just what are the limits,
specifically, if your random number is a 30 bit uint that makes numbers
in the range of 0-1 billion (nearly all generators make uints, with a few
exceptions that usually make less than 32 bits -- 64 bit generators are
still a rarity although of course two 32 rands makes one 64 bit rand)
and you mod with m, especially a nice large m that doesn't integer
divide r like 1.5 million?  That means that each integer in the range 0
to 1.5 million gets 666 repetitions as the entire range of r gets
sampled, except the first million that get 667.  That means that the
odds of pulling a number from 1 to a million are 1e6*667/1.e9 = .667.
The odds of pulling a number from the second million are 0.5*666/1.e9 =
.333 = 1 - .667.  An old lady with terrible eyes could detect such an
imbalance in probability from across the street -- you wouldn't even
need a "random number generator tester".

Weren't you advocating using this for nice large m like a million?  I
think that you were.  No, wait!  You were advocating something like one
BILLION, right?  Wrong direction to make it better, dude, this makes it
worse.

Note that this scales pretty well.  For m in the range of thousands, the
imbalance will be something like 0.666667 and 0.333330 -- still pretty
easy to detect with any halfway decent RNG tester.  Basically, you don't
get (immeasurably) close to a uniform distribution in the weighting of
any integer until you get down to (unsurprisingly) m of order unity
compared to a uint, which at which point it basically becomes as
accurate as m*(r/r_max) was in the first place.

Note also that you've created an imbalance in the weighting of the
integers you are sampling that is far, far greater and more serious than
any other failure of randomness that your RNG might have.  So much so
that you couldn't possibly see any other -- it would be a signal swamped
in the noise of your error, even for m merely in the thousands -- one
part per million errors in randomness are easy to detect in simulations
that draw 10^9 or so random numbers (which is still a SMALL TEST
simulation -- real simulations draw 10^16 or 10^18 and your error would
put answers on another PLANET compared to where they should be.

Most coders probably can actually work this out with a pencil, and so I
repeat no, nobody competent uses a modulus to generate integers in a
fixed range in circumstances where the quality of the result matters,
e.g. numerical simulation or cryptography as opposed to gaming, unless
the modulus is a power of two.

> However if x is large enough that shouldn't be a big problem.
>
> If we map now in the test i'm doing onto say a few million to a billion 
> entries,
> the size of that x is a number of 40+ bits for most RNG's.

x=32, a uint for most RNGs.  Or to put it another way, RNGs generate a
bit stream, which they might do with an algorithm that generates
30,31,32, or more bits at a time, but the prevalence of 32 bit
architectures and the fact that it is trivial to concatenate 32s to get
64+ bits when desired has slowed the development of true 64 bit RNGs.
Eventually there will be some, of course, and it will STILL be a mistake
to use a modulus to create random integers in some general range.  A bad
algorithm is a bad algorithm, and this makes sense only if speed is more
important than randomness (in which case one has to wonder, why use a 64
bit RNG in the first place, why use a good RNG in the first place).

> So that means that the deviation of the effect you show above the order of
> magnitued of 1 /  2^40 in such case, which is rather small.

Except that it isn't, as I showed in a fair bit of detail.  It might be
if x were as large as you claim, which it isn't (in general or even
"commonly") and if one confined m to be order unity.  For m of order
2^20 (a million) the error for 2^40 is order 2^20 (a millionth) which
shows up even in single precision floating point.  Why bother testing
such a stream for randomness?  It fails.  You've made it fail.  It fails
spectacularly if the generator is perfect, if the goddess Ifni herself
produces the string of digits.  It cannot succeed.

> Especially because the 'test' if you want to call it like that, is operating 
> in the
> granularity O ( log n ), we can fully ignore then the expected deviation 
> granularity O ( 2 ^ 40 ).

Well, except that basically 100% of the rngs in the GSL pass your "test"
when it is written correctly.  They also produce precisely the
correct/expected result (within easily understandable and expected
random scatter) on top of that if they are "good" rngs.  So the "test"
isn't much of an actual test, and your assertion that "all rngs fail it"
is false and based on a methodology that introduces many orders of
magnitude of error greater than the generators are known to have as
upper bounds.

Given this fact, which I have personally verified, do you imagine that
there might be other errors in your actual (not your pseudo) code?  You
gotta wonder.  If you've tested a Mersenne Twister with your "test" and
it fails to pass, either an MT is crap and all of the theoretical papers
and experienced testers who have tested with sophisticated and
peer-reviewed tools are stupid poo-poo heads, or, well, could it be that
your test or implementation of the MT is crap and the MT itself in
general is what everyone else seems to think that it is based on
extensive "paper fiddling" and enormous piles of empirical testing
evidence written by actual statisticians and rng experts.  Which is to
say, a damn good pseudo-RNG decorrelated in some 600+ dimensions that
passes nearly all known tests with flying colors.

Hmmm, let's put on our Bayesian thinking caps, consider the priors, and
try very hard to guess which one is much much more likely on Jaynes'
"decibel" scale of probabilities.  Would you say that it is 20 decibels
more likely that the MT is good and the test is broken?  50?  200?  I
like 2000 or thereabouts myself, or as we in the business might say, "it
is a fact" that your test is broken since 10^200 is a really big number,
comparatively speaking.

Now, it would be nice if you apologized to "all RNGs" and "all
programmers" and the various other groups you indicted on your little
fallacious rant, but I'll consider myself enormously fortunate at this
point if you simply acknowledge that maybe, just maybe, your original
pronouncement -- that all rngs produce an egregiously, trivially
verifiable excessive degree of first order sequential uniformity, is
categorically and undeniably false.

Of course, if you think I'm lying just to make you look bad, I can post
a modified version of dieharder with your test embedded so absolutely
anybody can see for themselves that all of the embedded generators pass
your test and that not one single thing you asserted in grandiosely
producing it was correct.  The code is quite short and anybody can
understand it.

Or you can take my moderately expert word for it -- the results I posted
are honest results produced using real RNGs from a real numerical
library in the real test written by block copying your pseudocode,
converting/realizing it in C, and fixing your obvious error in the
generation of random ints in the range 0-m by using a tested algorithm
written by people who actually know what they are doing that is IN the
aforementioned real numerical library.

Seriously, it is done.  Finished.  You're wrong.  Say "I'm sorry, Mr.
Mersenne Twister, if my test passes randu then how could it possibly
fail you?"  And don't forget to apologize to AES, RSA, DES, and all of
the other encryption schema too.  They all feel real bad that you called
them stupid poo-poo heads unable to pass the simplest first order
frequency test one can imagine, since they all had to pass MUCH more
rigorous and often government mandated testing to ever get adopted as
the basis for encryption.

I don't expect an apology to me for being indicted along with ALL the
OTHER programmers in the world for being stupid enough to use mod to
make a supposedly uniformly distributed range of m rands.  Not even
Numerical Recipes was that boneheaded. But its OK, we all know that we
didn't really ever do that, and if you did (and continue to do,
apparently, learning nothing from my patient and thorough exposition of
how it produces errors that are vastly greater than the ones that you
think you are detecting) that's a problem to who?  That's right, mister.
To you.  You'll just keep getting wroooooong answers, and then
announcing them as fact and making yourself look silly.  Or even
sillier, if that is possible.

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From atp at piskorski.com  Sat Aug 27 10:26:23 2011
From: atp at piskorski.com (Andrew Piskorski)
Date: Sat, 27 Aug 2011 10:26:23 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl>
References: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl>
Message-ID: <20110827142623.GA29931@piskorski.com>

On Fri, Aug 26, 2011 at 08:17:46PM +0200, Vincent Diepeveen wrote:
> On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote:

>> A betting system will not improve the negative mathematical  
>> expectation of a casino game.

Right.

> Except that this system doesn't have a negative expectation. it has a  
> positive expectation.
> 
> There is no other system in roulette that has a positive expectation,  
> other than the doubling system.

Vincent, are you shitting us?  Or am I misremembering the tortured
history of this thread, and by "doubling system" you do NOT mean the
trivial martingale betting system that's been used (disastrously) and
analyzed for over 200 years?

Actually it doesn't matter; as Shawn Hood pointed out above, your
assertion is still wrong even if you actually meant some other
non-martingale betting system.  You insisting that *martingale*
betting gives you a positive expectation at roulette just makes it
much funnier!

There are ways to gain positive expectation in roulette (other than
the obvious fraud and collusion).  They involve finding a poorly
installed roulette table and using a wearable computer and physics to
predict where the ball will land.  Look up Thorp and Shannon's
research on the subject; they actually used it in casinos c. 1961.

None of those ways are due to some special method of betting.  The
point of betting systems is to optimize your small edge, but you have
to HAVE that edge in the first place.  Money management is important
because tells you how to properly size your risk, but it can't give
you alpha.

Now yes, if you have a very volatile "roulette" game and a 0% edge (no
advantage to either you or the house), with some luck you could get
rich by playing it for a limited period of time and quitting while
you're ahead.  But you still have a 0% expectation game; look up the
mathematical definition of "expectation".

Also, I don't remember for sure, but I believe martingale betting is
(always) more aggressive than Kelly.  If so, then it is inherently
stupid.  Kelly defines the MAXIMUM size bet that it is rational to
make, assuming your goal is maximum compounded wealth AND you have a
quantifiable edge (however small) in the game.  It can make sense to
bet less than Kelly, and if you believe you have no edge the rational
bet is zero.  It is never rational to bet more than Kelly.

In practice, even when you are sure you have a real edge, you want to
bet less than Kelly, often much less.  There are several reasons for
that; one is that calculating Kelly depends on your estimate of how
big your edge is, and it is easy to overestimate your edge such that
in truth you are massively overbetting (taking way too much risk) at
2x Kelly or even more.

But optimizing the way you bet doesn't turn an inherently losing game
into a winner.  If the edge is with the house - as it certainly is
with a fair roulette table - the rational bet is not to make one.

This news article is probably more interesting:

  http://www.theonion.com/articles/casino-has-great-night,1506/
  Casino Has Great Night; May 28, 2003

-- 
Andrew Piskorski <atp at piskorski.com>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Sat Aug 27 11:27:37 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Sat, 27 Aug 2011 08:27:37 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <20110827142623.GA29931@piskorski.com>
Message-ID: <CA7E589E.A5F0%james.p.lux@jpl.nasa.gov>

>
>
>
>There are ways to gain positive expectation in roulette (other than
>the obvious fraud and collusion).  They involve finding a poorly
>installed roulette table and using a wearable computer and physics to
>predict where the ball will land.  Look up Thorp and Shannon's
>research on the subject; they actually used it in casinos c. 1961.

I think Shannon and Thorpe just analyzed it, without actually using it.

See "The Eudaemonic Pie" about some physics guys at UC Santa Cruz who
built wearable hardware. Early 70s, I should think, based on my
recollections of the kind of ICs they were using.  (I also note, based on
the book, that while they were good at the physics, they weren't very good
at electronics design and construction)


They never made the system work very well (concept sound, execution not so
hot)..but it did encourage the gaming industry to get new laws prohibiting
the use of assistive devices.  Just you and the casino, mano a mano (or,
more accurately cerebro a leyes de la probabilidad)

>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Wed Aug 31 13:29:18 2011
From: mathog at caltech.edu (David Mathog)
Date: Wed, 31 Aug 2011 10:29:18 -0700
Subject: [Beowulf] materials for air shroud?
Message-ID: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>

Anybody know of a nice cheap, high melting point, easy to work with
sheet material, for making a custom air shroud?  

We have one box with stuff in it that looks similar to HDPE, the
material the white flexible cutting boards are made of, but it is a bit
thinner and more rigid that that.  Unfortunately there are no markings
on it, so HDPE is just a guess.  Whatever it is, it cut easily with
scissors (I had to trim it slightly at one point.)

Background.  We have an older Supermicro SC-823 server with dual
processors.  The air shroud it came with only covers the first
processor.  That didn't matter much when it had two low power processors
in it, but after upgrading it to dual Opteron 280s, the uncovered second
one runs considerably hotter than the covered front one.  (Swapping the
processors around didn't help - the heat stayed where it was, so a
ventilation issue, not a processor issue.)  Supermicro does make a newer
shroud which extends to the back of the case, but the manual (google for
"SC-823 air shroud user's guide") indicates that it is designed for
Intel CPUs.  So it may or may not fit around the Opterons.

The redesigned air shroud will probably work, but I'm about 90%
confident that taping a sheet of plastic onto the back of the existing
shroud would work as well - if I can find a plastic that won't flap
around or melt.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Wed Aug 31 14:20:03 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Wed, 31 Aug 2011 14:20:03 -0400 (EDT)
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
References: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
Message-ID: <44280.192.168.93.213.1314814803.squirrel@mail.eadline.org>

David,

I have experimented with some simple ducting for my Limulus
system. I found a Vinyl Flashing from Union Corrugating Company
(purchased at Lowes home center) that has some nice features, it is
bendable, holds its shape, easy to cut, and has a low carbon content
(harder to burn than most plastics), and it is fairly stiff.

My needs are "low temp" air ducting. I have not tested it
with constant warm/hot air.

--
Doug


> Anybody know of a nice cheap, high melting point, easy to work with
> sheet material, for making a custom air shroud?
>
> We have one box with stuff in it that looks similar to HDPE, the
> material the white flexible cutting boards are made of, but it is a bit
> thinner and more rigid that that.  Unfortunately there are no markings
> on it, so HDPE is just a guess.  Whatever it is, it cut easily with
> scissors (I had to trim it slightly at one point.)
>
> Background.  We have an older Supermicro SC-823 server with dual
> processors.  The air shroud it came with only covers the first
> processor.  That didn't matter much when it had two low power processors
> in it, but after upgrading it to dual Opteron 280s, the uncovered second
> one runs considerably hotter than the covered front one.  (Swapping the
> processors around didn't help - the heat stayed where it was, so a
> ventilation issue, not a processor issue.)  Supermicro does make a newer
> shroud which extends to the back of the case, but the manual (google for
> "SC-823 air shroud user's guide") indicates that it is designed for
> Intel CPUs.  So it may or may not fit around the Opterons.
>
> The redesigned air shroud will probably work, but I'm about 90%
> confident that taping a sheet of plastic onto the back of the existing
> shroud would work as well - if I can find a plastic that won't flap
> around or melt.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Wed Aug 31 14:43:39 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Wed, 31 Aug 2011 11:43:39 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
References: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F01085116C6C2@ALTPHYEMBEVSP20.RES.AD.JPL>

Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..)  It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too.

On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. )

Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar.  Polystyrene and acrylic tend to be more brittle.  ABS is very nice to work with.  PVC is also easy to work with. Nylon is another possibility.

Do you want to be able to glue it?

What I would do is call up profesionalplastics.com  formerly Cadillac Plastics (many outlets nationwide) and see what they have.  It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go.  Plastic Depot in Burbank has a huge selection.

Drive over there, and ask the counter folks what would work for you.  $10-20 will get you more plastic than you know what to do with.

Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications.


Jim Lux
+1(818)354-2075 

> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog
> Sent: Wednesday, August 31, 2011 10:29 AM
> To: beowulf at beowulf.org
> Subject: [Beowulf] materials for air shroud?
> 
> Anybody know of a nice cheap, high melting point, easy to work with
> sheet material, for making a custom air shroud?
> 
> We have one box with stuff in it that looks similar to HDPE, the
> material the white flexible cutting boards are made of, but it is a bit
> thinner and more rigid that that.  Unfortunately there are no markings
> on it, so HDPE is just a guess.  Whatever it is, it cut easily with
> scissors (I had to trim it slightly at one point.)
> 
> Background.  We have an older Supermicro SC-823 server with dual
> processors.  The air shroud it came with only covers the first
> processor.  That didn't matter much when it had two low power processors
> in it, but after upgrading it to dual Opteron 280s, the uncovered second
> one runs considerably hotter than the covered front one.  (Swapping the
> processors around didn't help - the heat stayed where it was, so a
> ventilation issue, not a processor issue.)  Supermicro does make a newer
> shroud which extends to the back of the case, but the manual (google for
> "SC-823 air shroud user's guide") indicates that it is designed for
> Intel CPUs.  So it may or may not fit around the Opterons.
> 
> The redesigned air shroud will probably work, but I'm about 90%
> confident that taping a sheet of plastic onto the back of the existing
> shroud would work as well - if I can find a plastic that won't flap
> around or melt.
> 
> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Wed Aug 31 15:15:22 2011
From: mathog at caltech.edu (David Mathog)
Date: Wed, 31 Aug 2011 12:15:22 -0700
Subject: [Beowulf] materials for air shroud?
Message-ID: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>

> Cardboard? Card stock? Masking tape? White glue? (that's what I
> usually use for cooling ducts.. easy to cut, glue, tape..)  It's no
> more flammable than plastic, and it doesn't melt and get soft. 

That never crossed my mind.

You sure about the flammability?  I believe it for the ignition due to
temperature (Fahrenheit 451 and all that).  However, I have a gut
feeling (but no data) that sparks are fairly likely to ignite cardboard,
and less likely to ignite a solid plastic sheet (polyethylene or
polypropylene, for instance).  Not that I'm expecting sparks, but that
is a real possibility when a power supply fails.  Maybe even a brief
flame.  Of course paper won't hold up well compared to plastic if it
gets wet.  Moisture resistance is not important here though - if the
insides of the computer are dripping, air shroud failure is the least of
my worries.  

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Wed Aug 31 15:18:36 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Wed, 31 Aug 2011 12:18:36 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
References: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F01085116C6D6@ALTPHYEMBEVSP20.RES.AD.JPL>

Paper doesn't catch fire at 451F.. it does start to turn brown. (Sorry Ray..)
(I cook bacon on a rack over paper in a 450 degree oven.. and I doubt the temperature control is that tight)

Flammability is an issue.. paper is rougher than most plastics, so a spark can lodge or a small fiber could catch.  You could fireproof the paper pretty easily with a variety of treatments.


Jim Lux
+1(818)354-2075 

> -----Original Message-----
> From: David Mathog [mailto:mathog at caltech.edu]
> Sent: Wednesday, August 31, 2011 12:15 PM
> To: Lux, Jim (337C); beowulf at beowulf.org
> Subject: RE: [Beowulf] materials for air shroud?
> 
> > Cardboard? Card stock? Masking tape? White glue? (that's what I
> > usually use for cooling ducts.. easy to cut, glue, tape..)  It's no
> > more flammable than plastic, and it doesn't melt and get soft.
> 
> That never crossed my mind.
> 
> You sure about the flammability?  I believe it for the ignition due to
> temperature (Fahrenheit 451 and all that).  However, I have a gut
> feeling (but no data) that sparks are fairly likely to ignite cardboard,
> and less likely to ignite a solid plastic sheet (polyethylene or
> polypropylene, for instance).  Not that I'm expecting sparks, but that
> is a real possibility when a power supply fails.  Maybe even a brief
> flame.  Of course paper won't hold up well compared to plastic if it
> gets wet.  Moisture resistance is not important here though - if the
> insides of the computer are dripping, air shroud failure is the least of
> my worries.
> 
> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bill at cse.ucdavis.edu  Wed Aug 31 17:04:44 2011
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Wed, 31 Aug 2011 14:04:44 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
References: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
Message-ID: <4E5EA1EC.7080804@cse.ucdavis.edu>

On 08/31/2011 12:15 PM, David Mathog wrote:
> That never crossed my mind.
> 
> You sure about the flammability?  I believe it for the ignition due to
> temperature (Fahrenheit 451 and all that).  However, I have a gut
> feeling (but no data) that sparks are fairly likely to ignite cardboard,
> and less likely to ignite a solid plastic sheet (polyethylene or
> polypropylene, for instance).  Not that I'm expecting sparks, but that
> is a real possibility when a power supply fails.  Maybe even a brief
> flame.  Of course paper won't hold up well compared to plastic if it
> gets wet.  Moisture resistance is not important here though - if the
> insides of the computer are dripping, air shroud failure is the least of
> my worries.  

I'm aware of a machine room fire that was attributed to cardboard dust
and the storage of flammable material (paper and cardboard).

I wouldn't recommend cardboard or anything else that might generate
flammable dust in a high 50-90C airflow environment with low humidity.

Supermicro does seem to play pretty fast and loose with a shroud and
cooling in general.  We had nodes bouncing off the thermal max (and
throttling) despite air intake temperatures 30F below the specifications
while having very low power load in the node (read that as no expansion
cards, one low rpm disk, and the lowest clocked CPU).

We did however get them to ship us free shrouds once we complained.

Is it really worth wasting even an hour to not get the real shroud?  Not
sure if this is the one, but they aren't particularly expensive ($13):

http://www.provantage.com/supermicro-mcp-310-18003-0n~7SUP91KW.htm
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Aug 31 17:05:34 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 31 Aug 2011 17:05:34 -0400 (EDT)
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F01085116C6C2@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
	<ECE7A93BD093E1439C20020FBE87C47F01085116C6C2@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1108311700460.6118@lilith>

On Wed, 31 Aug 2011, Lux, Jim (337C) wrote:

Also thin aluminum.  You can get aluminum sheeting that you can cut with
scissors and that is easy to bend into shapes if you have a bending jig
(or can make one with two pieces of board stock and a vise).  Cheap,
fireproof, meltproof at any temperatures you're likely to reach, no
toxic fumes in a fire, can be glued or screwed.  The one drawback is
that it is a PITA to weld or solder if that's important to you, but for
an air shroud you can probably make compression joints (interlocking U
rims, squeezed down) that are adequate.

Most hardware stores (roof flashing), some auto parts or hobby stores.
Copper too, but more expensive.  Don't know about thin "enough" sheet
steel, but probably -- copper or steel would both weld or solder easily.

    rgb

> Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..)  It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too.
>
> On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. )
>
> Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar.  Polystyrene and acrylic tend to be more brittle.  ABS is very nice to work with.  PVC is also easy to work with. Nylon is another possibility.
>
> Do you want to be able to glue it?
>
> What I would do is call up profesionalplastics.com  formerly Cadillac Plastics (many outlets nationwide) and see what they have.  It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go.  Plastic Depot in Burbank has a huge selection.
>
> Drive over there, and ask the counter folks what would work for you.  $10-20 will get you more plastic than you know what to do with.
>
> Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications.
>
>
> Jim Lux
> +1(818)354-2075
>
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog
>> Sent: Wednesday, August 31, 2011 10:29 AM
>> To: beowulf at beowulf.org
>> Subject: [Beowulf] materials for air shroud?
>>
>> Anybody know of a nice cheap, high melting point, easy to work with
>> sheet material, for making a custom air shroud?
>>
>> We have one box with stuff in it that looks similar to HDPE, the
>> material the white flexible cutting boards are made of, but it is a bit
>> thinner and more rigid that that.  Unfortunately there are no markings
>> on it, so HDPE is just a guess.  Whatever it is, it cut easily with
>> scissors (I had to trim it slightly at one point.)
>>
>> Background.  We have an older Supermicro SC-823 server with dual
>> processors.  The air shroud it came with only covers the first
>> processor.  That didn't matter much when it had two low power processors
>> in it, but after upgrading it to dual Opteron 280s, the uncovered second
>> one runs considerably hotter than the covered front one.  (Swapping the
>> processors around didn't help - the heat stayed where it was, so a
>> ventilation issue, not a processor issue.)  Supermicro does make a newer
>> shroud which extends to the back of the case, but the manual (google for
>> "SC-823 air shroud user's guide") indicates that it is designed for
>> Intel CPUs.  So it may or may not fit around the Opterons.
>>
>> The redesigned air shroud will probably work, but I'm about 90%
>> confident that taping a sheet of plastic onto the back of the existing
>> shroud would work as well - if I can find a plastic that won't flap
>> around or melt.
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Wed Aug 31 17:24:48 2011
From: mathog at caltech.edu (David Mathog)
Date: Wed, 31 Aug 2011 14:24:48 -0700
Subject: [Beowulf] materials for air shroud?
Message-ID: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>

 Robert G. Brown wrote

> Also thin aluminum. 

No way, at least not anywhere near the motherboard.  There isn't going
to be a way to fasten it very tightly into position, just tape probably,
possibly a zip tie at the back end.  So it would be best if the shroud
cannot short things out or scratch components off the motherboard if it
falls out of position.

I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for
this, and it is similar to the shroud material we have in another server.

Regards,  

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Glen.Beane at jax.org  Wed Aug 31 17:42:23 2011
From: Glen.Beane at jax.org (Glen Beane)
Date: Wed, 31 Aug 2011 21:42:23 +0000
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <4E5EA1EC.7080804@cse.ucdavis.edu>
References: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>,
	<4E5EA1EC.7080804@cse.ucdavis.edu>
Message-ID: <EB8AA368-5EAD-47CF-85A7-5F8B680063FC@jax.org>


On Aug 31, 2011, at 5:05 PM, "Bill Broadley" <bill at cse.ucdavis.edu> wrote:

> On 08/31/2011 12:15 PM, David Mathog wrote:
>> That never crossed my mind.
>> 
>> You sure about the flammability?  I believe it for the ignition due to
>> temperature (Fahrenheit 451 and all that).  However, I have a gut
>> feeling (but no data) that sparks are fairly likely to ignite cardboard,
>> and less likely to ignite a solid plastic sheet (polyethylene or
>> polypropylene, for instance).  Not that I'm expecting sparks, but that
>> is a real possibility when a power supply fails.  Maybe even a brief
>> flame.  Of course paper won't hold up well compared to plastic if it
>> gets wet.  Moisture resistance is not important here though - if the
>> insides of the computer are dripping, air shroud failure is the least of
>> my worries.  
> 
> I'm aware of a machine room fire that was attributed to cardboard dust
> and the storage of flammable material (paper and cardboard).
> 


I've seen servers shipped with paperboard shrouds directing air over the processors...

I won't mention the vendor by name
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Aug 31 17:44:45 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 31 Aug 2011 17:44:45 -0400 (EDT)
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>
References: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>
Message-ID: <alpine.LFD.2.02.1108311739250.6118@lilith>

On Wed, 31 Aug 2011, David Mathog wrote:

> Robert G. Brown wrote
>
>> Also thin aluminum.
>
> No way, at least not anywhere near the motherboard.  There isn't going
> to be a way to fasten it very tightly into position, just tape probably,
> possibly a zip tie at the back end.  So it would be best if the shroud
> cannot short things out or scratch components off the motherboard if it
> falls out of position.

Don't forget the virtue of coat hangers.  Even rubber coated ones.

If you made the shroud out of aluminum, you could basically paint the
bottom with liquid electrical tape (or better, dip it four or five
times, drying it in between).  It would basically rubber-coat it.  No
shorting, no scratching, still moderately fireproof.  But as you wish.

> I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for
> this, and it is similar to the shroud material we have in another server.

The biggest problem with stuff like this (IIRC a discussion from long
ago) is you have to worry about what and how toxic it is in a fire, at
least if you want fire-persons to be able to enter the room in a fire.
Many plastics burn into really toxic materials.  You also have to worry
about how it will cope with high heat.  The good thing about aluminum is
that by the time it melts you won't care.  I think some of the liquid
tape compounds are fire retardant/melt resistant, and the aluminum
itself is such a good conductor of heat that it will act as a heat sink
for the rubber coating (in a good way).

    rgb

>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Wed Aug 31 17:56:08 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Wed, 31 Aug 2011 14:56:08 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <alpine.LFD.2.02.1108311739250.6118@lilith>
References: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108311739250.6118@lilith>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F01085116C754@ALTPHYEMBEVSP20.RES.AD.JPL>

Plastic tape covering the aluminum.. 20 mil "pipe wrap" is useful stuff.  3M VHB double stick foam tape to hold it in place.

But, enough of this feeble lash-up idea:  I think the real solution is to have a second cluster doing a complete finite element model of the instantaneous temperature distribution within the processor in question, driving a set of actuators to form a dynamically optimized shroud.  Or, perhaps the shroud could be made from millimachines implementing very simple control logic, but with an appropriate emergent behavior based on, say, their temperature sensing capability.  The millimachines should, of course, be self replicating.  Perhaps a suitably genetically engineered extremophile could be created?  

A second cluster does the model, a third cluster determines the optimum genetic sequence, a fourth cluster is responsible for iteratively doing the bioengineering to create the organisms, etc.  (or for a less biologically inspired system, the third and fourth clusters are doing some form of adaptive evolving micro manufacturing)

I'd provide more details, but really, that's just engineering, and is obvious to a skilled practitioner. 

 (for those not at CalTech (who is my employer, as well as David's), you can contact their patent counsel for rights to the invention disclosed above, which I'm sure they'll be happy to license to you and reasonable and non-discriminatory terms.<grin>)

> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> If you made the shroud out of aluminum, you could basically paint the
> bottom with liquid electrical tape (or better, dip it four or five
> times, drying it in between).  It would basically rubber-coat it.  No
> shorting, no scratching, still moderately fireproof.  But as you wish.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From reuti at staff.uni-marburg.de  Mon Aug  1 13:38:30 2011
From: reuti at staff.uni-marburg.de (Reuti)
Date: Mon, 1 Aug 2011 19:38:30 +0200
Subject: [Beowulf] Fwd: H8DMR-82 ECC error
References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk>
Message-ID: <B7F9806C-78E8-46D6-A1C6-184FF8D32827@staff.uni-marburg.de>

Hi all,

on behalf of J?rg I forward this to the list, as his account seems to be blocked to post to this list any longer.

-- Reuti


> #############
> Dear all,
> 
> as I cannot post directly to the list although I am subscribing to it, I have 
> asked a friend of mine to post that for me.
> I am currently having severe problems with one of the clusters I am 
> maintaining. Around 50% of these nodes are crashing when we are running cp2k 
> on it. Although they are IB nodes, even without the IB card installed the test 
> jobs crash the node as well. So I can rule out an IB related problem. Memtest 
> was ok, I done 9 cycles without any problems. Unfortunately I cannot swap the 
> memory as I don't have any of them at all and hence I have to rely on Memtest 
> here. The nodes which are causing the problems show other symptoms as well: I 
> had problem with 3 of them to boot again after a normal shutdown procedure 
> (the fans come on, and die after a short period and I don't even get to the 
> POST stage at all). So they are offline as well. Two of the remaining nodes were 
> exceedingly hot after a reboot. When I took them out the fans were spinning 
> and now they appear to be ok. These are AMD Opteron 2220 dual core processors 
> with 2 CPUs per node. The mother board is a H8DMR-82 with the BIOS version 
> 080014 (release date 07/13/2007). It appears that almost always the same nodes 
> are crashing with this error message:
> 
> Hardware Error
> CPU0 Machine Check Exception  4 Bank 2 b200200000000863
> TSC 108dd369444
> Processor 2:40f13 Time 1311847912 Socket 0 APIC 0
> MC2-Status: Uncorredted error, report: yes MisV: invalid
> CPU context corrupt: yes UECC Error
> Bud Unit Error: prefetch/ECC error in data read from NB: local node originated 
> (SRC)
> Transaction type: prefetch (mem access), no timeout, cache level L3/generic. 
> Participating Processors: local node originated (SRC)
> 
> Judging from this I would guess there is a memory related problem.
> Given there are a number of people on the list here and they probably have 
> seen similar hardware before, do I simply have a bad batch of hardware which 
> is known to cause problems or do I have a different issue here? What I am after 
> is some kind of idea of where to look next. It is not the compiled program as 
> taking out the disc and placing it in a different node (same motherboard, same 
> Opteron but slightly different flags) does not cause any problems at all.
> Given the large number of nodes which causing problems, before I am proposing 
> to write off these nodes I would like to make sure it is not a subtle issue 
> like a BIOS upgrade which could cure the problem.
> 
> Many thanks for your help and all the best from London
> 
> J?rg
> 
> ##############
> 
> 
> 
> -- 
> *************************************************************
> J?rg Sa?mannshausen
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ 
> 
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
> 
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Wed Aug  3 00:28:10 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Wed, 03 Aug 2011 14:28:10 +1000
Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
 -pre-alpha release
In-Reply-To: <207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
References: <BANLkTi=ADqyNS6uPEkVESOGRS0wXCQfoPg@mail.gmail.com><26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2><BANLkTinLb2Di_5dFEfemqsZG5UyG4KytBQ@mail.gmail.com><4DA5E85D.4010801@ats.ucla.edu><BANLkTinMFKDr7t6oARV5vYxkgj1iq1gYKQ@mail.gmail.com><CAHwLALNaCj2toQrJPK_YCnhZmmUDKRGEZNmT9Uef=QUDO5CbKA@mail.gmail.com><Pine.LNX.4.64.1107112248160.8112@coffee.psychology.mcmaster.ca>	<CAHwLALMuQ3M6EGzRLeOwnfkE7gJXMm5Wd_qDiXuaFKBF5dCYqQ@mail.gmail.com>
	<207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com>
Message-ID: <4E38CE5A.5080506@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 13/07/11 18:47, Hearns, John wrote:

>> I think a lot of this will apply to non-SGE batch schedulers -- in
>> fact Torque will support hwloc in a future release.
>
> That sounds good to me!
> 
> (Hint - if anyone from Altair is listening in it would be useful...)

There's already been Carl Smith from pbspro.com on the hwloc
mailing list finding configure problems with AIX (which
have been fixed)...

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk44zloACgkQO2KABBYQAh8KUACfd5r45HcKBQdxRdRm3rb42fO1
VbgAoINM9lQ2rCIsa6G9Yv0b2qWii2aC
=F/Jm
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Mon Aug  8 21:45:38 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Tue, 09 Aug 2011 11:45:38 +1000
Subject: [Beowulf] IBM terminates Blue Waters contract
Message-ID: <4E409142.8060900@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

NCSA is now looking for a new hardware supplier..

http://www.ncsa.illinois.edu/BlueWaters/system.html

# Effective August 6, 2011, IBM terminated its contract
# with the University of Illinois to provide the supercomputer
# for the National Center for Supercomputing Applications'
# Blue Waters project.

More info at El Reg:

http://www.theregister.co.uk/2011/08/08/ibm_kills_blue_waters_super/

# To date, IBM had shipped three racks of the Blue Waters
# supers to NCSA, and these will be returned. IBM has to
# give back $30m to NCSA.

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5AkUIACgkQO2KABBYQAh8icQCeL9PM2FW6ZAMLKz9Wg55oePGY
/FcAoJQGuHMOTNZ0bNddHIAy40ZCe5oB
=fID2
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mdidomenico4 at gmail.com  Tue Aug  9 08:46:13 2011
From: mdidomenico4 at gmail.com (Michael Di Domenico)
Date: Tue, 9 Aug 2011 08:46:13 -0400
Subject: [Beowulf] Memory Testing?
Message-ID: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>

The last discussion on the list about faulty memory surronded using
some software like memtest or hpl to trigger SBE.

I'm curious if anyone has any experience with ECC uncorrectable errors
(specifically not the identification of), but which specific dimm in
the chassis it's pointing to.

The mcelog in linux doesn't seem to report the dimm slot correctly on
my supermicro boards.

The only way i know how to narrow it down is to pull all the dimms,
and then test one at a time, with the system.

I'm curious if there is a better way, or if anyone has any opinions on
the below (or another similar) piece of hardware that might do the
same

http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From a.travis at abdn.ac.uk  Tue Aug  9 08:54:14 2011
From: a.travis at abdn.ac.uk (Tony Travis)
Date: Tue, 09 Aug 2011 13:54:14 +0100
Subject: [Beowulf] Memory Testing?
In-Reply-To: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
Message-ID: <4E412DF6.1080204@abdn.ac.uk>

On 09/08/11 13:46, Michael Di Domenico wrote:
> [...]
> I'm curious if there is a better way, or if anyone has any opinions on
> the below (or another similar) piece of hardware that might do the
> same
>
> http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm

Hi, Michael.

We had a RAM tester back in the day, but memory that it passed still 
gave errors in the real systems we were using. I screen memory in the 
system it is installed in using Memtest86+ then run Charles Cazabon's 
user-mode "Memtester" on the running system to assess its reliability:

   http://pyropus.ca/software/memtester/

HTH,

   Tony.
-- 
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Thu Aug 11 08:04:58 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Thu, 11 Aug 2011 08:04:58 -0400 (EDT)
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E412DF6.1080204@abdn.ac.uk>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>
Message-ID: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>


Most of you are probably not aware of this story
about trade secrets and Bash scripts on HPC clusters
(I was not until a few months ago)

  http://www.clustermonkey.net//content/view/308/33/


-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Thu Aug 11 10:05:00 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 07:05:00 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>

Interesting.. You wrote:
There is a general understanding that unless explicitly marked in the contents of the script (the text file that is the Bash program), a Bash script is freely available for use and modification by anyone. In some cases there is a copyright notice or a license that allows (or disallows) sharing or modification. These are always explicitly stated at the beginning of the script and obvious to anyone who reads or modifies the script. 

This is, of course, not correct under current law, marking is not required for copyright protection.  pretty much everything is born copyrighted.  Putting markings on it helps you claim for willful infringement (i.e. the recipient can't claim "I didn't know") which helps on the damages situation.  And, under the Berne convention, marking is required to assert your rights in some countries (All Rights Reserved is also required in some places)  Likewise, under current law, registration of copyright isn't required.  Registration allows you to collect statuatory damages for infringement, though.

For trade secrets, it's a bit trickier.  The recipient has to know that it's trade secret, but that can be done by marking on the delivery media, by a separate document, or even by verbal communication (here, this is proprietary, don't disclose it).  And you have to take some means to protect it: claiming something that is trade secret that is printed on bus stop  benches won't fly.  In any case, just because scripts aren't obfuscated doesn't mean they're not subject to trade secret protection.  If the owner of the secret takes some precautions to prevent wide disclosure (e.g. warning the recipient of its proprietary nature).  This is the aspect that will surely be the core of litigation:  would a "reasonable person" have known that the material was subject to trade secret protection.  As we all know, reasonable people differ, and the attorneys on both sides will trot out examples of marking and disclosure practices: good, bad, and indifferent.  As Doug noted, "special measur
 es" need to be taken, but there's no bright line standard for those measures, and, in practice, they can be pretty lax (and would be expected to be proportionate to the value of the secret.. the secret formula for Coke is probably more protected than the schedule for sweeping the floor in the manufacturing plant... both provide competitive advantage to Coke, but one is probably more important)

Something that a lot of tech people  in industry (particularly those coming from academia and working with open source) probably don't really fully understand is that pretty much everything you do for your employer is probably proprietary in some sense, and there is probably a written policy to that effect, which you, as an employee, are expected to be aware of. Or your supervisor told you, or the nice personnel person told you when you hired in 20 years ago, etc.  Mundane operational details of the business might be claimed to provide competitive advantage, especially if they're not "industry standard"  (humorously, if the employer has some really lame practice that's horrible, that might make it protectable.. then you could argue in court about whether it had any value). This is why there are "document review" departments and periodic training:  It helps reduce the problem of "inadvertent disclosure" and "I didn't know".  


This is the really tricky thing about trade secret: inadvertent disclosure can ruin the protection.  There have been cases of deliberately (and nefariously) "losing" trade secret info to spoil the protection.  And then, there is a somewhat notorious case of documents from Intel(?) that were in an envelope at a hotel desk or convention(?) with a person's name on it. Turns out there was a competitor (AMD?) with an employee of the same name, who accidentally got the documents handed to them (Hi, I'm John Smith, I think you have something for me.), opened the envelope, realized the problem, handed them right back, but in later action, it was alleged that this was sufficient to break the protection.  I don't recall all the details, and it probably settled out of court.  It's really complex.. "the bell, having been rung, cannot be unrung" (the phrase shows up in tons of legal writings), but in reality, if the inadvertent disclosure wasn't too big, etc.


Important things:
1) The language it's written in or obfuscation or not makes no difference.
2) the size of the work makes no difference.  "Candy/Is dandy/But liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair use, and anyway, the copyright may have expired)
3) the intellectual effort in the work makes no difference (unlike patents, there's no requirement of novelty) (unless you're trying to claim trade secret protection on something that's already public knowledge.. the thing might be public, but the fact that you selected that particular one might be trade secret.)


Jim

I am not a lawyer, but I spent all too many (hundreds) of hours in depositions and meetings and court where one of the main issues was the "was there adequate notice of the trade secret status of the information" as well as "did they steal it", not to mention the always popular "can you describe the secret with specificity and particularity".  If the bad guy steals the trade secret and then keeps it secret, it's fairly hard to show that they actually have it.  There are also folks who have developed techniques to evade the restrictions of an NDA ("Sure, I signed it, but that exceeded the scope of my corporate authority, so it's invalid. "  "Technically, I wasn't an employee that afternoon, even though I was in the morning, and I was the next week, but hey, for that afternoon, I wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry about giving you that business card with the company name on it, but it was what I happened to have in my wallet")


________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Douglas Eadline [deadline at eadline.org]
Sent: Thursday, August 11, 2011 05:04
To: beowulf at beowulf.org
Subject: [Beowulf] All Your BASH Are Belong To Us

Most of you are probably not aware of this story
about trade secrets and Bash scripts on HPC clusters
(I was not until a few months ago)

  http://www.clustermonkey.net//content/view/308/33/


--
Doug

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Thu Aug 11 10:35:01 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Thu, 11 Aug 2011 21:35:01 +0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
Message-ID: <4E43E895.6070803@pathscale.com>

  On 08/11/11 07:04 PM, Douglas Eadline wrote:
>
> Most of you are probably not aware of this story
> about trade secrets and Bash scripts on HPC clusters
> (I was not until a few months ago)
>
>    http://www.clustermonkey.net//content/view/308/33/
IANAL and this shouldn't be taken as legal advice -

Bret Stouder if you haven't done so already contact SFLC immediately.  
They provide legal services to open source projects and may be able to 
help.  (I can help put you in touch with them or other very good open 
source legal council.)


./C


/* Armchair lawyers are generally not helpful and in many cases it's 
counterproductive for them to express their own personal views.  I hope 
this discussion dies immediately without further comment */
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Thu Aug 11 12:58:47 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Thu, 11 Aug 2011 12:58:47 -0400 (EDT)
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.J
	PL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>


I had a chance to read some of the depositions, really interesting
and even embarrassing stuff. My guess is Atipa got angry when
Bret and the other employees left to form a new company. They
may have searched for ways to stop them and decided
to go after them for what Atipa considered "trade secrets."
A more or less traditional method to prevent ex-employees from
stealing your secret sauce (as you explain below).

The only problem was much of the "secrets" were developed
and shared in an open environment. This may have been a
surprise to those in charge and makes their claims
a bit harder to swallow. (i.e. a fundamental misunderstanding
of how trade secrets can be protected in an open source ecosystem).
And, what I try to point out in the article, is that this
open source ecosystem is what allowed hardware vendors to
sell clusters in the first place.

There is of course more to this case than I describe in the article.
I'll post more as it progresses.

--
Doug

> Interesting.. You wrote:
> There is a general understanding that unless explicitly marked in the
> contents of the script (the text file that is the Bash program), a Bash
> script is freely available for use and modification by anyone. In some
> cases there is a copyright notice or a license that allows (or disallows)
> sharing or modification. These are always explicitly stated at the
> beginning of the script and obvious to anyone who reads or modifies the
> script.
>
> This is, of course, not correct under current law, marking is not required
> for copyright protection.  pretty much everything is born copyrighted.
> Putting markings on it helps you claim for willful infringement (i.e. the
> recipient can't claim "I didn't know") which helps on the damages
> situation.  And, under the Berne convention, marking is required to assert
> your rights in some countries (All Rights Reserved is also required in
> some places)  Likewise, under current law, registration of copyright isn't
> required.  Registration allows you to collect statuatory damages for
> infringement, though.
>
> For trade secrets, it's a bit trickier.  The recipient has to know that
> it's trade secret, but that can be done by marking on the delivery media,
> by a separate document, or even by verbal communication (here, this is
> proprietary, don't disclose it).  And you have to take some means to
> protect it: claiming something that is trade secret that is printed on bus
> stop  benches won't fly.  In any case, just because scripts aren't
> obfuscated doesn't mean they're not subject to trade secret protection.
> If the owner of the secret takes some precautions to prevent wide
> disclosure (e.g. warning the recipient of its proprietary nature).  This
> is the aspect that will surely be the core of litigation:  would a
> "reasonable person" have known that the material was subject to trade
> secret protection.  As we all know, reasonable people differ, and the
> attorneys on both sides will trot out examples of marking and disclosure
> practices: good, bad, and indifferent.  As Doug noted, "special measures"
> need to be taken, but there's no bright line standard for those measures,
> and, in practice, they can be pretty lax (and would be expected to be
> proportionate to the value of the secret.. the secret formula for Coke is
> probably more protected than the schedule for sweeping the floor in the
> manufacturing plant... both provide competitive advantage to Coke, but one
> is probably more important)
>
> Something that a lot of tech people  in industry (particularly those
> coming from academia and working with open source) probably don't really
> fully understand is that pretty much everything you do for your employer
> is probably proprietary in some sense, and there is probably a written
> policy to that effect, which you, as an employee, are expected to be aware
> of. Or your supervisor told you, or the nice personnel person told you
> when you hired in 20 years ago, etc.  Mundane operational details of the
> business might be claimed to provide competitive advantage, especially if
> they're not "industry standard"  (humorously, if the employer has some
> really lame practice that's horrible, that might make it protectable..
> then you could argue in court about whether it had any value). This is why
> there are "document review" departments and periodic training:  It helps
> reduce the problem of "inadvertent disclosure" and "I didn't know".
>
>
> This is the really tricky thing about trade secret: inadvertent disclosure
> can ruin the protection.  There have been cases of deliberately (and
> nefariously) "losing" trade secret info to spoil the protection.  And
> then, there is a somewhat notorious case of documents from Intel(?) that
> were in an envelope at a hotel desk or convention(?) with a person's name
> on it. Turns out there was a competitor (AMD?) with an employee of the
> same name, who accidentally got the documents handed to them (Hi, I'm John
> Smith, I think you have something for me.), opened the envelope, realized
> the problem, handed them right back, but in later action, it was alleged
> that this was sufficient to break the protection.  I don't recall all the
> details, and it probably settled out of court.  It's really complex.. "the
> bell, having been rung, cannot be unrung" (the phrase shows up in tons of
> legal writings), but in reality, if the inadvertent disclosure wasn't too
> big, etc.
>
>
> Important things:
> 1) The language it's written in or obfuscation or not makes no difference.
> 2) the size of the work makes no difference.  "Candy/Is dandy/But
> liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair
> use, and anyway, the copyright may have expired)
> 3) the intellectual effort in the work makes no difference (unlike
> patents, there's no requirement of novelty) (unless you're trying to claim
> trade secret protection on something that's already public knowledge.. the
> thing might be public, but the fact that you selected that particular one
> might be trade secret.)
>
>
> Jim
>
> I am not a lawyer, but I spent all too many (hundreds) of hours in
> depositions and meetings and court where one of the main issues was the
> "was there adequate notice of the trade secret status of the information"
> as well as "did they steal it", not to mention the always popular "can you
> describe the secret with specificity and particularity".  If the bad guy
> steals the trade secret and then keeps it secret, it's fairly hard to show
> that they actually have it.  There are also folks who have developed
> techniques to evade the restrictions of an NDA ("Sure, I signed it, but
> that exceeded the scope of my corporate authority, so it's invalid. "
> "Technically, I wasn't an employee that afternoon, even though I was in
> the morning, and I was the next week, but hey, for that afternoon, I
> wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry
> about giving you that business card with the company name on it, but it
> was what I happened to have in my wallet")
>
>
>
> ________________________________________
> From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf
> Of Douglas Eadline [deadline at eadline.org]
> Sent: Thursday, August 11, 2011 05:04
> To: beowulf at beowulf.org
> Subject: [Beowulf] All Your BASH Are Belong To Us
>
> Most of you are probably not aware of this story
> about trade secrets and Bash scripts on HPC clusters
> (I was not until a few months ago)
>
>   http://www.clustermonkey.net//content/view/308/33/
>
>
> --
> Doug
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Thu Aug 11 13:40:35 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 10:40:35 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>

> -----Original Message-----
> From: Douglas Eadline [mailto:deadline at eadline.org]
> Sent: Thursday, August 11, 2011 9:59 AM
> To: Lux, Jim (337C)
> Cc: beowulf at beowulf.org
> Subject: RE: [Beowulf] All Your BASH Are Belong To Us
> 
> 
> I had a chance to read some of the depositions, really interesting
> and even embarrassing stuff. My guess is Atipa got angry when
> Bret and the other employees left to form a new company. They
> may have searched for ways to stop them and decided
> to go after them for what Atipa considered "trade secrets."
> A more or less traditional method to prevent ex-employees from
> stealing your secret sauce (as you explain below).
> 
> The only problem was much of the "secrets" were developed
> and shared in an open environment. This may have been a
> surprise to those in charge and makes their claims
> a bit harder to swallow. (i.e. a fundamental misunderstanding
> of how trade secrets can be protected in an open source ecosystem).
> And, what I try to point out in the article, is that this
> open source ecosystem is what allowed hardware vendors to
> sell clusters in the first place.
> 
> There is of course more to this case than I describe in the article.
> I'll post more as it progresses.
> 
> --


Yes.. and a standard way to attempt to do a "non-compete" (which are typically illegal in California) is for the former employer to threaten the new employer (or customers of the spin-off) with the "theft of trade secrets" allegation.  Even if the allegation is unfounded, you have to spend time and money dealing with it (if you're the ex-employee) or it creates sufficient fear, uncertainty, and doubt (on the part of the customers of the ex-employee spin off).

I'm also not so na?ve as to think that employees don't actually take trade secrets with them and use them, so it's not entirely improbable.

But, in a perfect world, there would be substantial sanctions for doing this kind of thing as a competitive maneuver.


Legal niceties aside, Doug brings up an interesting point about "trade secrets" or intellectual property in general...

You work at a job and become experienced and knowledgeable in a particular line of business.  How much of that is "general knowledge" (not protectable) and how much is "peculiar to the employer" (protectable)?  This is a pretty fuzzy thing.

A for instance.. say you leaned over to the next cube and asked someone for help formulating a particularly complex command line to grep a file.  The exact, character for character version of that command line probably belongs to the employer, but what about the knowledge you now have of how to do those kinds of searches?  What if your coworker had actually done the command line (in its exact form) at some other place and brought it with them?

Then, there's the practical details of getting approval from a (conservative) power-that-is.  Sure, you might have gotten it from open source, but will your corporate reviewer agree? Or, will they use the default "it's all proprietary unless proven otherwise, and we don't have time to look at your proof, and you don't have time  to be gathering the proof".

It's really depends on a corporate/organizational commitment to open source to institute processes to keep all this stuff straight.  (and we won't even get into "open source" vs "able to redistribute")
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Thu Aug 11 13:53:55 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Aug 2011 13:53:55 -0400
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E441733.80701@scalableinformatics.com>

On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote:

> It's really depends on a corporate/organizational commitment to open
> source to institute processes to keep all this stuff straight.  (and
> we won't even get into "open source" vs "able to redistribute")

There are profoundly incorrect views running around out there, as to 
what "open source" means.  I had someone tell me that GPLv2 prevented 
distribution of binaries (it doesn't).  I've watched people slap 
additional legal bits in conflict with GPL onto GPL source.

I don't want to say "its a mess" but I do want to say that "there is a 
profound need for a very simple statement of what is and isn't allowed 
by each license."  Including what is involved in altering licensing.

While these are more or less amusing and some won't really result in 
court cases and precedents, there is at least one effort that has some 
nice potential to test GPL.  See the zfs on linux systems. c.f. 
http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue

I can't imagine this will end well for any company shipping this, in 
source, build script, or binary form.  CDDL aside, Oracle's got some IP 
claims they could file, as well as other things.  I can't believe that 
shipping NetBSD binaries with Oracle IP inside would end well either.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Thu Aug 11 14:19:00 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 11:19:00 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E441733.80701@scalableinformatics.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
	<4E441733.80701@scalableinformatics.com>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>

 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman
> Sent: Thursday, August 11, 2011 10:54 AM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] All Your BASH Are Belong To Us
> 
> On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote:
> 
> > It's really depends on a corporate/organizational commitment to open
> > source to institute processes to keep all this stuff straight.  (and
> > we won't even get into "open source" vs "able to redistribute")
> 
> There are profoundly incorrect views running around out there, as to
> what "open source" means.  I had someone tell me that GPLv2 prevented
> distribution of binaries (it doesn't).  I've watched people slap
> additional legal bits in conflict with GPL onto GPL source.
> 
> I don't want to say "its a mess" but I do want to say that "there is a
> profound need for a very simple statement of what is and isn't allowed
> by each license."  Including what is involved in altering licensing.
> 

Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses.  They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems.  It's been a royal pain for me trying to release stuff to the general public in a useful form.  It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run.  But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements.  What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it.

The complication is that in general, work funded by NASA and performed by government employees is a "government work not subject to copyright" although work funded by NASA and performed by an educational institution (e.g. JPL, which is part of Cal Tech) is subject to Bayh-Dole, and is presumed to be owned by the educational institution, with a fully paid, non-exclusive license granted to the government for government purposes.  (there is, of course, litigation about what those "government purposes" might happen to be).

The incompatibility arises because NASA is legally obligated to distribute their products with no downstream restrictions on use, which is not the same as, for instance, GPL, which imposes restrictions on downstream use.   NASA (and the government in general) doesn't care if someone takes their product and uses it to make a subsequent closed source product which is totally proprietary. (and in fact, NASTRAN would be a fine example of this)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From gus at ldeo.columbia.edu  Thu Aug 11 15:28:28 2011
From: gus at ldeo.columbia.edu (Gus Correa)
Date: Thu, 11 Aug 2011 15:28:28 -0400
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E442D5C.1030902@ldeo.columbia.edu>

Lux, Jim (337C) wrote:
>> -----Original Message-----
>> From: Douglas Eadline [mailto:deadline at eadline.org]
>> Sent: Thursday, August 11, 2011 9:59 AM
>> To: Lux, Jim (337C)
>> Cc: beowulf at beowulf.org
>> Subject: RE: [Beowulf] All Your BASH Are Belong To Us
>>
>>
>> I had a chance to read some of the depositions, really interesting
>> and even embarrassing stuff. My guess is Atipa got angry when
>> Bret and the other employees left to form a new company. They
>> may have searched for ways to stop them and decided
>> to go after them for what Atipa considered "trade secrets."
>> A more or less traditional method to prevent ex-employees from
>> stealing your secret sauce (as you explain below).
>>
>> The only problem was much of the "secrets" were developed
>> and shared in an open environment. This may have been a
>> surprise to those in charge and makes their claims
>> a bit harder to swallow. (i.e. a fundamental misunderstanding
>> of how trade secrets can be protected in an open source ecosystem).
>> And, what I try to point out in the article, is that this
>> open source ecosystem is what allowed hardware vendors to
>> sell clusters in the first place.
>>
>> There is of course more to this case than I describe in the article.
>> I'll post more as it progresses.
>>
>> --
> 
> 
> Yes.. and a standard way to attempt to do a "non-compete" 
> (which are typically illegal in California) is for the former employer 
> to threaten the new employer (or customers of the spin-off) with the 
> "theft of trade secrets" allegation.  Even if the allegation is unfounded, 
> you have to spend time and money dealing with it 
> (if you're the ex-employee) or it creates sufficient fear,
> uncertainty, and doubt (on the part of the customers
> of the ex-employee spin off).
> 

Very true, and in the arena of intimidating former employees and
their current employers/competitors, there is nothing special
about the privatization of shell scripts or of nifty
regular expressions to grep files.

Recent examples include fields perhaps more lucrative than HPC,
such as English muffins
(Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella):

http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm

and high frequency trading (isn't it HPC also?) (Goldman Sachs vs. 
Sergey Aleynikov):

http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html

What is interesting is that across the board
the thing that free entrepreneurs seem
to hate the most is their competitors free entrepreneurship.

> I'm also not so na?ve as to think that employees don't actually take 
> trade secrets with them and use them, so it's not entirely improbable.
> 
> But, in a perfect world, there would be substantial sanctions for 
doing this kind of thing as a competitive maneuver.
> 
> 
> Legal niceties aside, Doug brings up an interesting point about 
"trade secrets" or intellectual property in general...
> 
> You work at a job and become experienced and knowledgeable in a 
> particular line of business.  How much of that is "general knowledge" 
> (not protectable) and how much is "peculiar to the employer" (protectable)?  This is a pretty fuzzy thing.
> 
> A for instance.. say you leaned over to the next cube and asked 
> someone for help formulating a particularly complex command line to 
> grep a file.  The exact, character for character version of that 
> command line probably belongs to the employer, but what about the 
> knowledge you now have of how to do those kinds of searches?  
> What if your coworker had actually done the command line 
> (in its exact form) at some other place and brought it with them?
> 
> Then, there's the practical details of getting approval from a 
> (conservative) power-that-is.  Sure, you might have gotten it from 
> open source, but will your corporate reviewer agree? Or, will they 
> use the default "it's all proprietary unless proven otherwise, and 
> we don't have time to look at your proof, and you don't have time  
> to be gathering the proof".
> 
> It's really depends on a corporate/organizational commitment to 
> open source to institute processes to keep all this stuff straight.  
> (and we won't even get into "open source" vs "able to redistribute")
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Thu Aug 11 15:57:43 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Aug 2011 15:57:43 -0400
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E442D5C.1030902@ldeo.columbia.edu>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
	<4E442D5C.1030902@ldeo.columbia.edu>
Message-ID: <4E443437.9070705@scalableinformatics.com>

On 08/11/2011 03:28 PM, Gus Correa wrote:

> Recent examples include fields perhaps more lucrative than HPC,
> such as English muffins
> (Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella):
>
> http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm

That muffin just got real ...

>
> and high frequency trading (isn't it HPC also?) (Goldman Sachs vs.
> Sergey Aleynikov):
>
> http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html
>
> What is interesting is that across the board
> the thing that free entrepreneurs seem
> to hate the most is their competitors free entrepreneurship.

I am running into an internal parser error in attempting to understand 
this last sentence.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Thu Aug 11 19:56:02 2011
From: mathog at caltech.edu (David Mathog)
Date: Thu, 11 Aug 2011 16:56:02 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>

Since this is very OT, I'll try to keep it short.

Here is the problem - imagine a group of people who neither know nor
trust each other, yet must agree on the fairness of a single random
number.  Basically they are going to have a lottery.  They aren't
organized enough to generate such a number themselves - it must be found
from some process already active on the web, and be so obviously "fair"
that they won't argue about that.  Everybody must be able to obtain it
freely from a web connection.  

Can any of you think of a source on the web for a set of small files
with these properties:

1.  from a trusted source (here this mostly means the data is generated
    for some other innocuous purpose)
2.  represents a largely random process (temperature readings,
    stock market values, etc.) with a set generated at known intervals,
    preferably daily (at least M-F)
3.  are never, ever, revised
4.  are distributed reliably (for instance, signed files)
5.  are publicly and freely available
6.  can be obtained reliably (is available from many sites)

So far I have looked at stock market values and weather data - without
much luck.

You would think the S&P 500 is the S&P 500 and one could look it up on
any site and get the same data.  Not so! Check the Yahoo and Google
financial sites for the first few weeks of Jan. 2011 and you will find
digits that differ between the two sites in every single column.  Not
every day mind you, but often enough that it isn't reliable.  Heck, the
volume numbers differ by large factors between the two sites.  So just
choose one site and go with that?  Not so fast - if the single source
goes down the data is unavailable, and there is no guarantee that the
site (which is not party to this particular use of their data) might not
revise the page or choose to block it entirely.

Or weather data, right?  Lots of random bits there and we trust NOAA. 
But good luck with criteria 3-6.  In particular, they don't give data
out for free.  In theory no US Government site should, since they are
supposed to charge to recover distribution costs.

Criteria 4-6 are typical of software distributed on mirror sites, but so
far I have not found any physical measurements which are distributed in
a similar manner.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mdidomenico4 at gmail.com  Thu Aug 11 20:28:11 2011
From: mdidomenico4 at gmail.com (Michael Di Domenico)
Date: Thu, 11 Aug 2011 20:28:11 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
Message-ID: <CABOsP2PEffq=40LbJohMcFHbeQxHQNtLuqfKSd_7MGYeX1svqw@mail.gmail.com>

How many random numbers per day are you expecting?
If everyone checks at exactly 1pm, should they all see the same
"random" number or should they each get their own "random" number?
What kind of entropy are you expecting on "random"?

On Thu, Aug 11, 2011 at 7:56 PM, David Mathog <mathog at caltech.edu> wrote:
> Since this is very OT, I'll try to keep it short.
>
> Here is the problem - imagine a group of people who neither know nor
> trust each other, yet must agree on the fairness of a single random
> number. ?Basically they are going to have a lottery. ?They aren't
> organized enough to generate such a number themselves - it must be found
> from some process already active on the web, and be so obviously "fair"
> that they won't argue about that. ?Everybody must be able to obtain it
> freely from a web connection.
>
> Can any of you think of a source on the web for a set of small files
> with these properties:
>
> 1. ?from a trusted source (here this mostly means the data is generated
> ? ?for some other innocuous purpose)
> 2. ?represents a largely random process (temperature readings,
> ? ?stock market values, etc.) with a set generated at known intervals,
> ? ?preferably daily (at least M-F)
> 3. ?are never, ever, revised
> 4. ?are distributed reliably (for instance, signed files)
> 5. ?are publicly and freely available
> 6. ?can be obtained reliably (is available from many sites)
>
> So far I have looked at stock market values and weather data - without
> much luck.
>
> You would think the S&P 500 is the S&P 500 and one could look it up on
> any site and get the same data. ?Not so! Check the Yahoo and Google
> financial sites for the first few weeks of Jan. 2011 and you will find
> digits that differ between the two sites in every single column. ?Not
> every day mind you, but often enough that it isn't reliable. ?Heck, the
> volume numbers differ by large factors between the two sites. ?So just
> choose one site and go with that? ?Not so fast - if the single source
> goes down the data is unavailable, and there is no guarantee that the
> site (which is not party to this particular use of their data) might not
> revise the page or choose to block it entirely.
>
> Or weather data, right? ?Lots of random bits there and we trust NOAA.
> But good luck with criteria 3-6. ?In particular, they don't give data
> out for free. ?In theory no US Government site should, since they are
> supposed to charge to recover distribution costs.
>
> Criteria 4-6 are typical of software distributed on mirror sites, but so
> far I have not found any physical measurements which are distributed in
> a similar manner.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From peter.st.john at gmail.com  Thu Aug 11 20:44:15 2011
From: peter.st.john at gmail.com (Peter St. John)
Date: Thu, 11 Aug 2011 20:44:15 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
Message-ID: <CAF4H3kcd7ZRcA4fzuNWC3bCLqNc7+uaBLHZ1MywACv+0SVVtXQ@mail.gmail.com>

David,
I was thinking the National Weather Service, instead of NOAA; it's a vital
public service that such information is recorded and diseminated for
airfields and the like, e.g.:
http://www.weather.gov/climate/getclimate.php?wfo=bou
So I would write a script to scrape least significant digits from that, for
agreed times, dates, and locations. Whoever writes the script and wherever
it is run, anyone can check its results manually.
However, that item has a disclaimer that the data is subject to review :) So
it may matter how far back in time you need to be able to go, and how long
into the future you need the data to be available at the same place. But
nobody promises their website will stay unchanged indefinitely, they can't.
But at any given time, a group can agree on (say) the lowest significant
digits of the temperatures at time T in cities X, Y, and Z as reported at
time T2 by the NWS.
Peter

On Thu, Aug 11, 2011 at 7:56 PM, David Mathog <mathog at caltech.edu> wrote:

> Since this is very OT, I'll try to keep it short.
>
> Here is the problem - imagine a group of people who neither know nor
> trust each other, yet must agree on the fairness of a single random
> number.  Basically they are going to have a lottery.  They aren't
> organized enough to generate such a number themselves - it must be found
> from some process already active on the web, and be so obviously "fair"
> that they won't argue about that.  Everybody must be able to obtain it
> freely from a web connection.
>
> Can any of you think of a source on the web for a set of small files
> with these properties:
>
> 1.  from a trusted source (here this mostly means the data is generated
>    for some other innocuous purpose)
> 2.  represents a largely random process (temperature readings,
>    stock market values, etc.) with a set generated at known intervals,
>    preferably daily (at least M-F)
> 3.  are never, ever, revised
> 4.  are distributed reliably (for instance, signed files)
> 5.  are publicly and freely available
> 6.  can be obtained reliably (is available from many sites)
>
> So far I have looked at stock market values and weather data - without
> much luck.
>
> You would think the S&P 500 is the S&P 500 and one could look it up on
> any site and get the same data.  Not so! Check the Yahoo and Google
> financial sites for the first few weeks of Jan. 2011 and you will find
> digits that differ between the two sites in every single column.  Not
> every day mind you, but often enough that it isn't reliable.  Heck, the
> volume numbers differ by large factors between the two sites.  So just
> choose one site and go with that?  Not so fast - if the single source
> goes down the data is unavailable, and there is no guarantee that the
> site (which is not party to this particular use of their data) might not
> revise the page or choose to block it entirely.
>
> Or weather data, right?  Lots of random bits there and we trust NOAA.
> But good luck with criteria 3-6.  In particular, they don't give data
> out for free.  In theory no US Government site should, since they are
> supposed to charge to recover distribution costs.
>
> Criteria 4-6 are typical of software distributed on mirror sites, but so
> far I have not found any physical measurements which are distributed in
> a similar manner.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110811/e23e2d56/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From james.p.lux at jpl.nasa.gov  Thu Aug 11 20:55:30 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 17:55:30 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <CAF4H3kcd7ZRcA4fzuNWC3bCLqNc7+uaBLHZ1MywACv+0SVVtXQ@mail.gmail.com>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
	<CAF4H3kcd7ZRcA4fzuNWC3bCLqNc7+uaBLHZ1MywACv+0SVVtXQ@mail.gmail.com>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108494CD217@ALTPHYEMBEVSP20.RES.AD.JPL>

Low order digits from weather stations are not likely to be random.
They're almost certainly converted from some quantized converter, and may actually have a double conversion (Celsius Fahrenheit)

NWS and NOAA are actually part of the same organization, aren't they.  (since the NWS web page at weather.gov is titled "NOAA's National Weather Service")

Jim Lux
+1(818)354-2075
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Peter St. John
Sent: Thursday, August 11, 2011 5:44 PM
To: David Mathog
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] OT: public random numbers?

David,
I was thinking the National Weather Service, instead of NOAA; it's a vital public service that such information is recorded and diseminated for airfields and the like, e.g.:
http://www.weather.gov/climate/getclimate.php?wfo=bou
So I would write a script to scrape least significant digits from that, for agreed times, dates, and locations. Whoever writes the script and wherever it is run, anyone can check its results manually.
However, that item has a disclaimer that the data is subject to review :) So it may matter how far back in time you need to be able to go, and how long into the future you need the data to be available at the same place. But nobody promises their website will stay unchanged indefinitely, they can't. But at any given time, a group can agree on (say) the lowest significant digits of the temperatures at time T in cities X, Y, and Z as reported at time T2 by the NWS.
Peter

On Thu, Aug 11, 2011 at 7:56 PM, David Mathog <mathog at caltech.edu<mailto:mathog at caltech.edu>> wrote:
Since this is very OT, I'll try to keep it short.

Here is the problem - imagine a group of people who neither know nor
trust each other, yet must agree on the fairness of a single random
number.  Basically they are going to have a lottery.  They aren't
organized enough to generate such a number themselves - it must be found
from some process already active on the web, and be so obviously "fair"
that they won't argue about that.  Everybody must be able to obtain it
freely from a web connection.

Can any of you think of a source on the web for a set of small files
with these properties:

1.  from a trusted source (here this mostly means the data is generated
   for some other innocuous purpose)
2.  represents a largely random process (temperature readings,
   stock market values, etc.) with a set generated at known intervals,
   preferably daily (at least M-F)
3.  are never, ever, revised
4.  are distributed reliably (for instance, signed files)
5.  are publicly and freely available
6.  can be obtained reliably (is available from many sites)

So far I have looked at stock market values and weather data - without
much luck.

You would think the S&P 500 is the S&P 500 and one could look it up on
any site and get the same data.  Not so! Check the Yahoo and Google
financial sites for the first few weeks of Jan. 2011 and you will find
digits that differ between the two sites in every single column.  Not
every day mind you, but often enough that it isn't reliable.  Heck, the
volume numbers differ by large factors between the two sites.  So just
choose one site and go with that?  Not so fast - if the single source
goes down the data is unavailable, and there is no guarantee that the
site (which is not party to this particular use of their data) might not
revise the page or choose to block it entirely.

Or weather data, right?  Lots of random bits there and we trust NOAA.
But good luck with criteria 3-6.  In particular, they don't give data
out for free.  In theory no US Government site should, since they are
supposed to charge to recover distribution costs.

Criteria 4-6 are typical of software distributed on mirror sites, but so
far I have not found any physical measurements which are distributed in
a similar manner.

Thanks,

David Mathog
mathog at caltech.edu<mailto:mathog at caltech.edu>
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110811/530523ff/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From samuel at unimelb.edu.au  Thu Aug 11 21:54:11 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Fri, 12 Aug 2011 11:54:11 +1000
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
Message-ID: <4E4487C3.60605@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/08/11 22:04, Douglas Eadline wrote:

> Most of you are probably not aware of this story
> about trade secrets and Bash scripts on HPC clusters

On the copyright side of things (not the trade secret stuff),
my understanding (IANAL, etc) is that anything you create you[0]
hold copyright on[1], and for someone else to copy it they must
have some agreement (license) to be able to do so.

Thus a shell script with no license attached or embedded
is copyrighted and you should get explicit permission to
use it..

cheers,
Chris

[0] - where "you" is the entity that is the copyright holder,
      not necessarily the creator.

[1] - yes, I know there are some entities that aren't allowed
      to hold copyright.. :-)
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5Eh8MACgkQO2KABBYQAh8sOgCePl6n4UTNZGMAePc8Kb+kmK4a
DHwAoJeVgYKUMDpJe78/2mQqbL2ryJ4M
=UAan
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cbergstrom at pathscale.com  Thu Aug 11 23:22:34 2011
From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=)
Date: Fri, 12 Aug 2011 10:22:34 +0700
Subject: [Beowulf] Open source @NASA - WAS: OT
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>	<4E441733.80701@scalableinformatics.com>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E449C7A.1030102@pathscale.com>

  On 08/12/11 01:19 AM, Lux, Jim (337C) wrote:
>
> Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses.  They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems.  It's been a royal pain for me trying to release stuff to the general public in a useful form.  It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run.  But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements.  What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it.
Hi Jim,

For this exact problem you've described an ebuild could be a very good 
solution.  (I've personally abandoned gentoo a long time ago)

By solution I mean bash script that explicitly checks the hashes, 
resolves the deps and pulls the source to build everything from the 
eleventy-seven URLs and FTP sites.

The people working with gentoo-science would likely appreciate it a 
lot.  (The learning curve is fairly low if you know bash already)
--------
With regards to open source license proliferation and 
incompatibilities.  I think most people in the community are working 
towards streamlining, but changes after-the-fact can be 
difficult/impossible.  I'm empathetic to your situation and I'd say work 
towards getting your projects merged with something like gentoo to start 
and then maybe something like OpenSuSE build service.  This would cover 
a very large % of the packaging/distribution problem and get it in the 
hands of users easily.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From samuel at unimelb.edu.au  Thu Aug 11 23:51:12 2011
From: samuel at unimelb.edu.au (Christopher Samuel)
Date: Fri, 12 Aug 2011 13:51:12 +1000
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>	<4E441733.80701@scalableinformatics.com>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <4E44A330.7090503@unimelb.edu.au>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/08/11 04:19, Lux, Jim (337C) wrote:

> The incompatibility arises because NASA is legally
> obligated to distribute their products with no
> downstream restrictions on use,

Actually no - the NASA license is incompatible with
the GPL (at least) because:

http://www.gnu.org/licenses/license-list.html

# The NASA Open Source Agreement, version 1.3, is not
# a free software license because it includes a provision
# requiring changes to be your ?original creation?. Free
# software development depends on combining code from
# third parties, and the NASA license doesn't permit this.

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9
+iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT
=GL6d
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Thu Aug 11 23:57:03 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Thu, 11 Aug 2011 20:57:03 -0700
Subject: [Beowulf] All Your BASH Are Belong To Us
In-Reply-To: <4E44A330.7090503@unimelb.edu.au>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
	<4E412DF6.1080204@abdn.ac.uk>,
	<39761.192.168.93.213.1313064298.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D67@ALTPHYEMBEVSP20.RES.AD.JPL>
	<38613.192.168.93.213.1313081927.squirrel@mail.eadline.org>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD140@ALTPHYEMBEVSP20.RES.AD.JPL>
	<4E441733.80701@scalableinformatics.com>
	<ECE7A93BD093E1439C20020FBE87C47F0108494CD166@ALTPHYEMBEVSP20.RES.AD.JPL>,
	<4E44A330.7090503@unimelb.edu.au>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D69@ALTPHYEMBEVSP20.RES.AD.JPL>

Yes, that too...

________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Christopher Samuel [samuel at unimelb.edu.au]
Sent: Thursday, August 11, 2011 20:51
To: beowulf at beowulf.org
Subject: Re: [Beowulf] All Your BASH Are Belong To Us

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/08/11 04:19, Lux, Jim (337C) wrote:

> The incompatibility arises because NASA is legally
> obligated to distribute their products with no
> downstream restrictions on use,

Actually no - the NASA license is incompatible with
the GPL (at least) because:

http://www.gnu.org/licenses/license-list.html

# The NASA Open Source Agreement, version 1.3, is not
# a free software license because it includes a provision
# requiring changes to be your ?original creation?. Free
# software development depends on combining code from
# third parties, and the NASA license doesn't permit this.

cheers,
Chris
- --
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9
+iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT
=GL6d
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 00:31:30 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 00:31:30 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
References: <E1Qrf6Y-0004YI-51@mendel.bio.caltech.edu>
Message-ID: <alpine.LFD.2.02.1108120030370.5818@lilith>

On Thu, 11 Aug 2011, David Mathog wrote:

> Since this is very OT, I'll try to keep it short.
>
> Here is the problem - imagine a group of people who neither know nor
> trust each other, yet must agree on the fairness of a single random
> number.  Basically they are going to have a lottery.  They aren't
> organized enough to generate such a number themselves - it must be found
> from some process already active on the web, and be so obviously "fair"
> that they won't argue about that.  Everybody must be able to obtain it
> freely from a web connection.

   http://www.random.org/

sincerely,

    rgb

>
> Can any of you think of a source on the web for a set of small files
> with these properties:
>
> 1.  from a trusted source (here this mostly means the data is generated
>    for some other innocuous purpose)
> 2.  represents a largely random process (temperature readings,
>    stock market values, etc.) with a set generated at known intervals,
>    preferably daily (at least M-F)
> 3.  are never, ever, revised
> 4.  are distributed reliably (for instance, signed files)
> 5.  are publicly and freely available
> 6.  can be obtained reliably (is available from many sites)
>
> So far I have looked at stock market values and weather data - without
> much luck.
>
> You would think the S&P 500 is the S&P 500 and one could look it up on
> any site and get the same data.  Not so! Check the Yahoo and Google
> financial sites for the first few weeks of Jan. 2011 and you will find
> digits that differ between the two sites in every single column.  Not
> every day mind you, but often enough that it isn't reliable.  Heck, the
> volume numbers differ by large factors between the two sites.  So just
> choose one site and go with that?  Not so fast - if the single source
> goes down the data is unavailable, and there is no guarantee that the
> site (which is not party to this particular use of their data) might not
> revise the page or choose to block it entirely.
>
> Or weather data, right?  Lots of random bits there and we trust NOAA.
> But good luck with criteria 3-6.  In particular, they don't give data
> out for free.  In theory no US Government site should, since they are
> supposed to charge to recover distribution costs.
>
> Criteria 4-6 are typical of software distributed on mirror sites, but so
> far I have not found any physical measurements which are distributed in
> a similar manner.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 11:21:37 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 08:21:37 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>

Robert G. Brown wrote:

>  Everybody must be able to obtain it
> > freely from a web connection.
> 
>    http://www.random.org/
> 

Nice site.  They have something that is very close, the pregenerated
random files, from which a small set of digits may be extracted, and the
files themselves have MD5 checksums (but are not signed).
They also support https.  It comes up a little short on criteria 1 (we
really don't know what is going on behind the scenes) and 6 (it is a
single site.)

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From landman at scalableinformatics.com  Fri Aug 12 11:26:05 2011
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 12 Aug 2011 11:26:05 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
Message-ID: <4E45460D.9040505@scalableinformatics.com>

On 08/12/2011 11:21 AM, David Mathog wrote:
> Robert G. Brown wrote:
>
>>   Everybody must be able to obtain it
>>> freely from a web connection.
>>
>>     http://www.random.org/

And from SGI days ... http://www.lavarnd.org/

> Nice site.  They have something that is very close, the pregenerated
> random files, from which a small set of digits may be extracted, and the
> files themselves have MD5 checksums (but are not signed).
> They also support https.  It comes up a little short on criteria 1 (we
> really don't know what is going on behind the scenes) and 6 (it is a
> single site.)


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 11:58:28 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 08:58:28 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>

Peter St. John wrote:
> But at any given time, a group can agree on (say) the lowest significant
> digits of the temperatures at time T in cities X, Y, and Z as reported at
> time T2 by the NWS.

Actually we don't know that, at least not reliably enough for this
purpose.  It may be that the one web address is actually multiple
servers, and if the NWS pushes out data revisions these could return
different results for T:X,Y,Z at T2 if the servers were not strictly
synchronized.  Never mind the caching problems that revisions like this
would create on browsers.  I have no idea if the NWS revises their data
files, but it would not be surprising if they did.

After posting I thought of one other source of more or less random
verifiable numbers - the scores of sporting events.  These are not
always generated every day, and are seasonal for the various sports. 
They are however highly verifiable and when multiple events are grouped,
pretty much impossible to "fix" to preselected digits.  For instance:

  http://www.nfl.com/scores
  http://mlb.mlb.com/mlb/scoreboard
  http://scores.espn.go.com/nba/scoreboard?date=20110304

These sites maintain historical records.  Even if they didn't the scores
are widely published, and there are tens of thousands of witnesses to
the original event, so it would be pretty much impossible to
intentionally change a final score.  There could still be copying/typo
errors from site to site though, but if such an error was discovered it
would be easy enough to resolve.  There is no intrinsic order to the
scores, and some scheduled games might be canceled, so it would have to
be something like "sort the scores from all NBA teams who played on
4/4/11 into ascending order and concatenate the digits".

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Fri Aug 12 12:09:46 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Fri, 12 Aug 2011 09:09:46 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D6B@ALTPHYEMBEVSP20.RES.AD.JPL>

All nice suggestions, but I wonder if they're truly random.

Scores of games have underlying patterns from the "rules of the game"  (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points.  basketball goals are 2 or 3 points, etc.)

I'm sure someone has analyzed this.

I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller).

What about using random.org and it being backed-up on archive.org?  Does that give you the "multiple independent sites" desired?


________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu]
Sent: Friday, August 12, 2011 08:58
To: Peter St. John; beowulf at beowulf.org
Subject: Re: [Beowulf] OT: public random numbers?

Peter St. John wrote:
> But at any given time, a group can agree on (say) the lowest significant
> digits of the temperatures at time T in cities X, Y, and Z as reported at
> time T2 by the NWS.

Actually we don't know that, at least not reliably enough for this
purpose.  It may be that the one web address is actually multiple
servers, and if the NWS pushes out data revisions these could return
different results for T:X,Y,Z at T2 if the servers were not strictly
synchronized.  Never mind the caching problems that revisions like this
would create on browsers.  I have no idea if the NWS revises their data
files, but it would not be surprising if they did.

After posting I thought of one other source of more or less random
verifiable numbers - the scores of sporting events.  These are not
always generated every day, and are seasonal for the various sports.
They are however highly verifiable and when multiple events are grouped,
pretty much impossible to "fix" to preselected digits.  For instance:

  http://www.nfl.com/scores
  http://mlb.mlb.com/mlb/scoreboard
  http://scores.espn.go.com/nba/scoreboard?date=20110304

These sites maintain historical records.  Even if they didn't the scores
are widely published, and there are tens of thousands of witnesses to
the original event, so it would be pretty much impossible to
intentionally change a final score.  There could still be copying/typo
errors from site to site though, but if such an error was discovered it
would be easy enough to resolve.  There is no intrinsic order to the
scores, and some scheduled games might be canceled, so it would have to
be something like "sort the scores from all NBA teams who played on
4/4/11 into ascending order and concatenate the digits".

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 13:04:43 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 10:04:43 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1QrvA3-0004lV-FR@mendel.bio.caltech.edu>

> All nice suggestions, but I wonder if they're truly random.

Random enough in this case - as they are only used to form a seed for a
random number generator, and a seed is only needed "rarely".  So even
though pro basketball scores have definite trends and often look like
(101,95),(103,87),(98,76), these can still create a decent seed value
once sorted and concatenated:  10310198958776
(Lets assume the seed need not be odd.)

> What about using random.org and it being backed-up on archive.org? 
Does that give you the "multiple independent sites" desired?

To some degree, but not as much as the large number of sites that
distribute game scores and stock values.  I originally favored using
stock values until it turned out that those numbers are squishier than
one might have expected, particularly so for indices like the S&P 500
and Dow Jones.  A fellow who works at S&P told me that the opening
prices are prone to timing problems, since at T=0+delta some of the
issues in the index will have traded, and some will not, with the
untraded stock values being filled in with stale values.  I think
similar timing issues affect all the other index values too
(high/low/close).  In these cases, since the index is derived from
formulas, some sites may be independently calculating the values, and
tiny differences in the times the stock values are measured result in
different numbers.  All it takes is one trade difference between the
sample points to change some digits.  When I get some time I still need
to look and see if the high/low/close values for individual stocks are
also variable from web site to web site.  These numbers might be more
reliable for single stocks since they might all trace back to the data
feed from the exchange where the issue trades.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Fri Aug 12 13:22:46 2011
From: mathog at caltech.edu (David Mathog)
Date: Fri, 12 Aug 2011 10:22:46 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1QrvRW-0004lo-I1@mendel.bio.caltech.edu>

Michael Di Domenico wrote:

> How many random numbers per day are you expecting?

One would be sufficient.

> If everyone checks at exactly 1pm, should they all see the same
> "random" number or should they each get their own "random" number?

They should all see the same number.

Example: a random number based on physical events which occurred on
8/10/11 would become available on or shortly after that day.  Starting
from the time it first becomes available, and going forward ideally
forever, everybody who wants to should be able to retrieve that same
random number.  

That is, nobody should be able to predict the number before hand, and
everybody should be able to verify it later.  So the number must be both
random and etched in stone.

> What kind of entropy are you expecting on "random"?

In practice relatively little is needed, 16 bits should be plenty.
(More wouldn't hurt, of course.)

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 14:35:17 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 14:35:17 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
Message-ID: <alpine.LFD.2.02.1108121416030.3145@lilith>

On Fri, 12 Aug 2011, David Mathog wrote:

> Robert G. Brown wrote:
>
>>  Everybody must be able to obtain it
>>> freely from a web connection.
>>
>>    http://www.random.org/
>>
>
> Nice site.  They have something that is very close, the pregenerated
> random files, from which a small set of digits may be extracted, and the
> files themselves have MD5 checksums (but are not signed).
> They also support https.  It comes up a little short on criteria 1 (we
> really don't know what is going on behind the scenes) and 6 (it is a
> single site.)

Behind the scenes is documented pretty well on the site, and the guy who
runs it is a human being, you can communicate with him to learn even
more.  I already know him a bit, as he and I have collaborated on
applying dieharder to test random.org datasets -- even "the" random.org
dataset as of some time ago (I have a few hundred MB of random number
from the site in my dieharder directory).  IIRC, the numbers are
generated continuously and fairly slowly by grabbing and filtering and
transforming atmospheric noise.  As a source of entropy, that is
probably excellent if (as noted) slow, but many good sources of entropy
seem to be fairly slow.  He has good reason to think that his numbers
are theoretically "true random numbers" -- both unpredictable and
flat/decorrelated at all orders, and even though there aren't really
enough of them for my purposes, I've used them as one of the (small)
"gold standard" sources for testing dieharder even as I test them.  For
all practical purposes threefish or aes are truly random as well and
they are a lot faster and easier to use as gold standard generators,
though.

I don't quite understand why the single site restriction is important --
this site has been up for years and I don't expect it to go away soon;
it is quite reliable.  I don't think there is anything secret about how
the numbers are generated, and I'll certify that the numbers it produces
don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
your part; 6 I don't really understand but the guy who runs the site is
clearly willing to construct a custom feed for cash customers, if there
is enough value in whatever it is you are trying to do to pay for
access.  If it's just a lottery, well, lord, I can think of a dozen ways
to make numbers so random that they'd be unimpeachable for any sort of
lottery, both unpredictable and uncorrelated, and they don't any of them
require any significant amount of entropy to get started.

I will add one warning -- "randomness" is a rather stringent
mathematical criterion, and is generally tested against the null
hypothesis.  Amateurs who want to make random number generators out of
supposedly "random" data streams or fancy algorithms almost invariably
fail, sometimes spectacularly so.  There are a half dozen or more
really, really good pseudorandom number generators out there and it is
easy to hotwire them together into an xor-based high entropy stream that
basically never repeats (feeding it a bit of real entropy now and then
as it operates).  I would strongly counsel you against trying to take
e.g. weather data and make something "random" out of it.  Unless you
really know what you are doing, you will probably make something that
isn't at all random and may not even be unpredictable.  Even most
sources of "quantum" randomness (which is at least possibly "truly
random", although I doubt it) aren't flat, so that they carry the
signature of their generation process unless/until you manage to
transform them into something flat (difficult unless you KNOW the
distribution they are producing).  Pseudorandom number generators have
the serious advantage of being amenable to at least some theoretical
analysis (so you can "guarantee" flatness out to some high
dimensionality, say) as well as empirical testing with e.g. dieharder.

HTH,

     rgb

>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 14:40:36 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 14:40:36 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <4E45460D.9040505@scalableinformatics.com>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<4E45460D.9040505@scalableinformatics.com>
Message-ID: <alpine.LFD.2.02.1108121438180.3145@lilith>

On Fri, 12 Aug 2011, Joe Landman wrote:

> On 08/12/2011 11:21 AM, David Mathog wrote:
>> Robert G. Brown wrote:
>>
>>>   Everybody must be able to obtain it
>>>> freely from a web connection.
>>>
>>>     http://www.random.org/
>
> And from SGI days ... http://www.lavarnd.org/

Yeah, like that.  Notice the work they have to do to make a
not-really-random or only partially-random source flat, unpredictable,
random.  What they do is probably overkill -- nobody on earth could
detect a deviation from randomness if they did only half of their
folding and retransformation with crypto grade prngs, but it is still a
pretty reliable scheme.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 12 14:59:59 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 12 Aug 2011 14:59:59 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F0108492E5D6B@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<ECE7A93BD093E1439C20020FBE87C47F0108492E5D6B@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1108121441140.3145@lilith>

On Fri, 12 Aug 2011, Lux, Jim (337C) wrote:

> All nice suggestions, but I wonder if they're truly random.
>
> Scores of games have underlying patterns from the "rules of the game"  (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points.  basketball goals are 2 or 3 points, etc.)
>
> I'm sure someone has analyzed this.
>
> I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller).
>
> What about using random.org and it being backed-up on archive.org?  Does that give you the "multiple independent sites" desired?

As I said and repeat, nothing like this is at all random.  Random is
stuff like thermal noise, shot noise, quantum noise, and even all of
those things are distributed and not flat and require massaging to make
into uniform deviates or random bits.  Unpredictable is easy, of course
-- flip a coin, roll some dice -- until you need to make it
>>rigorously<< unpredictable and >>rigorously<< uncorrelated, at which
point you need to not screw around with weather, scores, market closing
values, even "randomly sampled" ticks of a nanosecond clock aren't that
random without some work to make them so.

I liked the lavarnd site, and I like random.org.  Hell, tap into both of
their streams, they're both practically perfect as sources of random
numbers go, and it gives you your redundancy and you can xor their
streams together to get yet another irrelevant and probably unnecessary
degree of lack of correlation.  Even if one stream is subtley correlated
and the other is too, the chances of the correlations "matching" and
persisting through an xor process are astronomical.  But then, finding
correlations in the output of a properly seeded crypto prng is pretty
astronomically unlikely BEFORE you xor-fold it stream-wise a few dozen
times into a source of real entropy like atmospheric noise or
electro-optical noise.

If you want something better, you'll probably have to explain your
application in a bit more detail.  Do you need rigorously random and
flat numbers, or just something unpredictable?  The latter is cheap and
easy and can be done in the privacy of your own home by reading from
/dev/random or /dev/urandom (or perhaps from Intel's new on-CPU rngs).
The former requires theory and some work and some heavy duty empirical
testing.

Just remember, numbers are not random.  Numbers are numbers.  The
number 7 could be "random" or not not by its nature but by how the 7
was generated.

Processes, in other words, are (approximately, oxymoronically) random.
If you want random numbers, find a (mathematically provably) "random"
process, at least to some order and for some purposes...

    rgb

>
>
> ________________________________________
> From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu]
> Sent: Friday, August 12, 2011 08:58
> To: Peter St. John; beowulf at beowulf.org
> Subject: Re: [Beowulf] OT: public random numbers?
>
> Peter St. John wrote:
>> But at any given time, a group can agree on (say) the lowest significant
>> digits of the temperatures at time T in cities X, Y, and Z as reported at
>> time T2 by the NWS.
>
> Actually we don't know that, at least not reliably enough for this
> purpose.  It may be that the one web address is actually multiple
> servers, and if the NWS pushes out data revisions these could return
> different results for T:X,Y,Z at T2 if the servers were not strictly
> synchronized.  Never mind the caching problems that revisions like this
> would create on browsers.  I have no idea if the NWS revises their data
> files, but it would not be surprising if they did.
>
> After posting I thought of one other source of more or less random
> verifiable numbers - the scores of sporting events.  These are not
> always generated every day, and are seasonal for the various sports.
> They are however highly verifiable and when multiple events are grouped,
> pretty much impossible to "fix" to preselected digits.  For instance:
>
>  http://www.nfl.com/scores
>  http://mlb.mlb.com/mlb/scoreboard
>  http://scores.espn.go.com/nba/scoreboard?date=20110304
>
> These sites maintain historical records.  Even if they didn't the scores
> are widely published, and there are tens of thousands of witnesses to
> the original event, so it would be pretty much impossible to
> intentionally change a final score.  There could still be copying/typo
> errors from site to site though, but if such an error was discovered it
> would be easy enough to resolve.  There is no intrinsic order to the
> scores, and some scheduled games might be canceled, so it would have to
> be something like "sort the scores from all NBA teams who played on
> 4/4/11 into ascending order and concatenate the digits".
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From nixon at nsc.liu.se  Fri Aug 12 16:46:21 2011
From: nixon at nsc.liu.se (Leif Nixon)
Date: Fri, 12 Aug 2011 22:46:21 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
Message-ID: <CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>

On 12 August 2011 17:58, David Mathog <mathog at caltech.edu> wrote:

> After posting I thought of one other source of more or less random
> verifiable numbers - the scores of sporting events. ?These are not
> always generated every day, and are seasonal for the various sports.
> They are however highly verifiable and when multiple events are grouped,
> pretty much impossible to "fix" to preselected digits.

Have you looked at RFC3797? Not sure if it has any solutions for you, but it
at least discusses the same problems.


-- 
Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert
------------------------------------------------------------
National Supercomputer Centre? ? -? ? ? Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Sat Aug 13 13:51:46 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 13 Aug 2011 13:51:46 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1108131338240.18847@lilith>

On Fri, 12 Aug 2011, Leif Nixon wrote:

> On 12 August 2011 17:58, David Mathog <mathog at caltech.edu> wrote:
>
>> After posting I thought of one other source of more or less random
>> verifiable numbers - the scores of sporting events. ?These are not
>> always generated every day, and are seasonal for the various sports.
>> They are however highly verifiable and when multiple events are grouped,
>> pretty much impossible to "fix" to preselected digits.
>
> Have you looked at RFC3797? Not sure if it has any solutions for you, but it
> at least discusses the same problems.

If people know how you are going to pick the seed of your rng, and know
the rng, and know (or measure) the distribution function from which your
seed is being drawn, they can easily transform the game into a non-zero
sum game with advantage over all of those that don't do all of that.

The only way to avoid this sort of thing is to pick your seed from a
flat, unpredictable distribution.  Unpredictable (in it's purest sense)
includes flat, but the score distribution of almost any sporting event
is, I'm pretty sure, not flat.

That's why I really don't like the idea of running a lottery off of data
like this.  No state lottery could ever be certified on top of this sort
of data.

I'll tell you what.  Piggy back your lottery to theirs.  Powerball games
occur every day all over the US.  Pick your seed from the last 10 digits
of one of those games.  They are announced, publicly available on
websites (I'm pretty sure), and if they aren't certifiably random,
something is seriously wrong.  In any event they are usually generated
from an easily understandable random physical process that is almost
certainly flat as well as unpredictable.

Then pop it into your favorite AES-based or threefish based RNG, or cook
up something yourself with even more rotors, spin it a while, and out
comes your lottery winner -- basically a transmogrification of public
state lottery number, but that's an ADVANTAGE, not a disadvantage...

    rgb

>
>
> -- 
> Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre? ? -? ? ? Linkoping University
> ------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From hahn at mcmaster.ca  Sat Aug 13 20:06:04 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Sat, 13 Aug 2011 20:06:04 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108131338240.18847@lilith>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
Message-ID: <Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>

>>> After posting I thought of one other source of more or less random
>>> verifiable numbers - the scores of sporting events. ?These are not

I immediately thought of another widely published stream of immutable noise: 
the congressional record.  sorry, no smiley ;)

> Then pop it into your favorite AES-based or threefish based RNG, or cook
> up something yourself with even more rotors, spin it a while, and out
> comes your lottery winner

sorry, I don't understand your emphasis on flatness.  why does the
distribution of the seed (entropy source) matter, as long as it's 
reasonably large and not predictable before publication date?
the crypto hash takes care of whitening, doesn't it?

thanks, mark hahn.
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From hahn at mcmaster.ca  Sat Aug 13 22:22:52 2011
From: hahn at mcmaster.ca (Mark Hahn)
Date: Sat, 13 Aug 2011 22:22:52 -0400 (EDT)
Subject: [Beowulf] Memory Testing?
In-Reply-To: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1108132209420.6167@coffee.psychology.mcmaster.ca>

> I'm curious if anyone has any experience with ECC uncorrectable errors
> (specifically not the identification of), but which specific dimm in
> the chassis it's pointing to.

we've had good luck using EDAC to pin down bad dimms -
at least those that that cause _correctable_ errors.
our uncorrectable errors trigger panics.  I suppose that's selectable,
though I guess you could turn that off (/sys/module/edac_mc/panic_on_ue)

> The mcelog in linux doesn't seem to report the dimm slot correctly on
> my supermicro boards.

I prefer the hardware-topology-based naming that edac uses
(controller, channel, chipselect).  I guess recent versions of edac
have a user-space tool that will translate that for you (but of course,
you have to verify the topo-to-label mapping yourself anyway.)

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Sun Aug 14 18:05:31 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 14 Aug 2011 18:05:31 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
	<Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>
Message-ID: <alpine.LFD.2.02.1108141420140.18847@lilith>

On Sat, 13 Aug 2011, Mark Hahn wrote:

>>>> After posting I thought of one other source of more or less random
>>>> verifiable numbers - the scores of sporting events. ?These are not
>
> I immediately thought of another widely published stream of immutable noise: 
> the congressional record.  sorry, no smiley ;)
>
>> Then pop it into your favorite AES-based or threefish based RNG, or cook
>> up something yourself with even more rotors, spin it a while, and out
>> comes your lottery winner
>
> sorry, I don't understand your emphasis on flatness.  why does the
> distribution of the seed (entropy source) matter, as long as it's reasonably 
> large and not predictable before publication date?
> the crypto hash takes care of whitening, doesn't it?

Bayes theorem.  If one knows that (say) the distribution of digits in
sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all
the other digits -- because e.g. football games rarely get 4-9 in the
second digit slot (note that this is an example only) one can gain a
near 2-1 advantage over everybody else playing by picking seeds with the
right frequencies and using only those seeds to select a set of numbers,
if (as it sounds) there is an openly published unique map between the
seed and the lottery outcome so "anybody can check that it is fair".  In
this latter case you aren't trying to guess the white "random" outcome,
you are trying to guess the seed, and if the seed is drawn from a
non-flat space you'll beat the pants off of anyone playing blind by
using that space to generate your seeds/guesses.

Basically you take the lottery from being a lottery with all numbers
equally represented in the outcome space to being the moral equivalent
of predicting the actual point outcome of N football or basketball
games.  The size of the latter space is MUCH smaller than the size of
all possible scores, right?  In fact, it is "small" compared to the
latter space.

So, sorry, I think that for a lottery (especially one with e.g. a cash
payout and deep pocketed people capable of speculatively gambling to win
based on expectation value based on an openly published hash and seeing
method) needs to use a true random, true white seed, since you might
just as well use the seed as the lottery number in this case and in no
other case is it fair.

Of course, if the lottery is for cakes at a bake sale, who cares.  Just
don't underestimate the cleverness of would-be attackers if the lottery
has an openly published method of generating the result and/or
potentially large payout.  Plenty of people would tackle the project of
cracking the lottery just for the thrill, even if the payout wasn't that
great.  If the payout was large enough, you'd have have deep-pocketed
smart people covering the entire most-likely-point spread generated by
Vegas bookies, week after week, through proxies, and making a bundle
from it.

    rgb

>
> thanks, mark hahn.

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Sun Aug 14 22:59:25 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Sun, 14 Aug 2011 19:59:25 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108141420140.18847@lilith>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
	<Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>,
	<alpine.LFD.2.02.1108141420140.18847@lilith>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F010850F22F15@ALTPHYEMBEVSP20.RES.AD.JPL>

Given the discussion about lotteries, etc.

This is the classic thing of "numbers games" as run by the mob.  You pick a 3 digit number, and the winning number is determined by some readily available public source (stock market, sports games, racetrack winners, etc.).  There's probably a fair amount of literature (aside from the works of M. Puzo) describing it.

Payoff was something like 600:1 or 750:1, against a nominal 1000:1, so the numbers bank makes their money on the differential (the vig).

Just looked up wikipedia..
" later led to the use of the last three numbers in the published daily balance of the United States Treasury."

A moderately well known mathematician named Claude Shannon probably analyzed it.. He collaborated with E. Thorpe on some other interesting work on games.
________________________________________
From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Robert G. Brown [rgb at phy.duke.edu]
Sent: Sunday, August 14, 2011 15:05
To: Mark Hahn
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] OT: public random numbers?

On Sat, 13 Aug 2011, Mark Hahn wrote:

>>>> After posting I thought of one other source of more or less random
>>>> verifiable numbers - the scores of sporting events. ?These are not
>
> I immediately thought of another widely published stream of immutable noise:
> the congressional record.  sorry, no smiley ;)
>
>> Then pop it into your favorite AES-based or threefish based RNG, or cook
>> up something yourself with even more rotors, spin it a while, and out
>> comes your lottery winner
>
> sorry, I don't understand your emphasis on flatness.  why does the
> distribution of the seed (entropy source) matter, as long as it's reasonably
> large and not predictable before publication date?
> the crypto hash takes care of whitening, doesn't it?

Bayes theorem.  If one knows that (say) the distribution of digits in
sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all
the other digits -- because e.g. football games rarely get 4-9 in the
second digit slot (note that this is an example only) one can gain a
near 2-1 advantage over everybody else playing by picking seeds with the
right frequencies and using only those seeds to select a set of numbers,
if (as it sounds) there is an openly published unique map between the
seed and the lottery outcome so "anybody can check that it is fair".  In
this latter case you aren't trying to guess the white "random" outcome,
you are trying to guess the seed, and if the seed is drawn from a
non-flat space you'll beat the pants off of anyone playing blind by
using that space to generate your seeds/guesses.

Basically you take the lottery from being a lottery with all numbers
equally represented in the outcome space to being the moral equivalent
of predicting the actual point outcome of N football or basketball
games.  The size of the latter space is MUCH smaller than the size of
all possible scores, right?  In fact, it is "small" compared to the
latter space.

So, sorry, I think that for a lottery (especially one with e.g. a cash
payout and deep pocketed people capable of speculatively gambling to win
based on expectation value based on an openly published hash and seeing
method) needs to use a true random, true white seed, since you might
just as well use the seed as the lottery number in this case and in no
other case is it fair.

Of course, if the lottery is for cakes at a bake sale, who cares.  Just
don't underestimate the cleverness of would-be attackers if the lottery
has an openly published method of generating the result and/or
potentially large payout.  Plenty of people would tackle the project of
cracking the lottery just for the thrill, even if the payout wasn't that
great.  If the payout was large enough, you'd have have deep-pocketed
smart people covering the entire most-likely-point spread generated by
Vegas bookies, week after week, through proxies, and making a bundle
from it.

    rgb

>
> thanks, mark hahn.

Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Mon Aug 15 07:57:26 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 15 Aug 2011 07:57:26 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F010850F22F15@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <E1Qru7w-0004jS-Sj@mendel.bio.caltech.edu>
	<CACZDV_hP+x5cdFu0sNk4mW7hrfbB039yjHUnm9eg7=eZfd3MCA@mail.gmail.com>
	<alpine.LFD.2.02.1108131338240.18847@lilith>
	<Pine.LNX.4.64.1108131953130.22079@coffee.psychology.mcmaster.ca>,
	<alpine.LFD.2.02.1108141420140.18847@lilith>
	<ECE7A93BD093E1439C20020FBE87C47F010850F22F15@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1108150753570.14474@lilith>

On Sun, 14 Aug 2011, Lux, Jim (337C) wrote:

> A moderately well known mathematician named Claude Shannon probably
> analyzed it.. He collaborated with E. Thorpe on some other interesting
> work on games.

Shannon?  Shannon?  The name almost rings a Bell.  For your information,
I think he's a few bits short of a byte, if you know what I mean.  The
guy practically Bayes at the moon.

Sorry... feeling a bit, well, random this morning.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Mon Aug 15 13:08:59 2011
From: mathog at caltech.edu (David Mathog)
Date: Mon, 15 Aug 2011 10:08:59 -0700
Subject: [Beowulf] OT: public random numbers?
Message-ID: <E1Qt0ep-0005vm-5A@mendel.bio.caltech.edu>

Leif Nixon <nixon at nsc.liu.se> wrote:

> Have you looked at RFC3797? Not sure if it has any solutions for you,
but it
> at least discusses the same problems.

Good reference, I was not aware of that.  

It gives the same sorts of sources for random numbers as we have come up
with here: stock market, sports, lottery.  It discusses how stock market
data may not be reliable due to market splits and other accounting
issues.  However, I have determined that the raw data from the exchanges
is a terrible choice because it is not available for free, and the
values that are freely available, which are posted on web finance sites,
are not reliably identical in all digits.

Lottery results are a good source except for the black box / black
helicopter factors.  We don't generally know where those numbers are
coming from, and even in those cases where they do tell us, there is no
way to verify that any particular lottery drawing wasn't rigged.

We have not discussed election results (votes per candidate), but those
are, ironically, really unsuitable for this, even though statistically
the final set of digits should have a lot of entropy.  Mostly election
numbers are a problem because they may be revised for long periods after
the election, and the numbers could almost always be forced to shift by
a challenge by one of the candidates.  Every recount will come up with a
slightly different result.  Examples: the Coleman vs. Franken senatorial
contest in Minnesota, or Bush vs. Gore in Florida.

So I'm leaning towards sports scores, as those are generated in full
view of a multitude of witnesses (often numbering in the millions).  It
would be extremely difficult to rig the absolute final score.  It might
be possible to rig the winner, or even the point spread, but to rig the
absolute score in a high scoring game like basketball, would be
exceedingly difficult, and would likely be obvious to even the casual
observer.  To rig every digit in the final score of every game played on
a given day should be pretty close to impossible.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From cousins at umit.maine.edu  Mon Aug 15 16:59:11 2011
From: cousins at umit.maine.edu (Steve Cousins)
Date: Mon, 15 Aug 2011 16:59:11 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <mailman.1.1313434802.4411.beowulf@beowulf.org>
References: <mailman.1.1313434802.4411.beowulf@beowulf.org>
Message-ID: <alpine.LFD.2.00.1108151653260.8210@razzo.umeoce.maine.edu>


Hi David,

Can you give us more information about what you are doing? I'm getting 
curious about what problem you are working with that requires these 
conditions.

Steve

> We have not discussed election results (votes per candidate), but those
> are, ironically, really unsuitable for this, even though statistically
> the final set of digits should have a lot of entropy.  Mostly election
> numbers are a problem because they may be revised for long periods after
> the election, and the numbers could almost always be forced to shift by
> a challenge by one of the candidates.  Every recount will come up with a
> slightly different result.  Examples: the Coleman vs. Franken senatorial
> contest in Minnesota, or Bush vs. Gore in Florida.
>
> So I'm leaning towards sports scores, as those are generated in full
> view of a multitude of witnesses (often numbering in the millions).  It
> would be extremely difficult to rig the absolute final score.  It might
> be possible to rig the winner, or even the point spread, but to rig the
> absolute score in a high scoring game like basketball, would be
> exceedingly difficult, and would likely be obvious to even the casual
> observer.  To rig every digit in the final score of every game played on
> a given day should be pretty close to impossible.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From lindahl at pbm.com  Wed Aug 17 16:59:58 2011
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 17 Aug 2011 13:59:58 -0700
Subject: [Beowulf] Fwd: H8DMR-82 ECC error
In-Reply-To: <B7F9806C-78E8-46D6-A1C6-184FF8D32827@staff.uni-marburg.de>
References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk>
	<B7F9806C-78E8-46D6-A1C6-184FF8D32827@staff.uni-marburg.de>
Message-ID: <20110817205958.GB7650@bx9.net>

> Memtest was ok, I done 9 cycles without any problems.

You should be using the HPL implementation of the Linpack benchmark
for testing memory. It exercises all of the memory and all of the
cores, and is what most HPC vendors seem to use for node burnin.
There's even a bootable DVD with a kernel with enhanced EDAC that was
mentioned here a while back.

> Hardware Error
> CPU0 Machine Check Exception  4 Bank 2 b200200000000863
> TSC 108dd369444
> Processor 2:40f13 Time 1311847912 Socket 0 APIC 0
> MC2-Status: Uncorredted error, report: yes MisV: invalid
> CPU context corrupt: yes UECC Error
> Bud Unit Error: prefetch/ECC error in data read from NB: local node originated 
> (SRC)
> Transaction type: prefetch (mem access), no timeout, cache level L3/generic. 
> Participating Processors: local node originated (SRC)

And I take it that the location information given here (socket 0, bank
2) isn't useful?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From david.t.kewley at gmail.com  Sat Aug 20 16:16:28 2011
From: david.t.kewley at gmail.com (David Kewley)
Date: Sat, 20 Aug 2011 13:16:28 -0700
Subject: [Beowulf] Memory Testing?
In-Reply-To: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
References: <CABOsP2O_ZvFbFJx3XDNvr7_XVu_aB9AwiKSyef1+eRDLNdtotg@mail.gmail.com>
Message-ID: <CAM+eeKf6os2jphciAE_Hnd4xtEiSZcZ5RKZrfMc08HEwux-xMQ@mail.gmail.com>

A few bits from my corner of the experience space:

If you have a BMC, 'ipmitool sel list' will probably show the correctable
and uncorrectable errors, generally not naming the DIMM involved. But
'ipmitool sel list -v' shows details from various fields in the SEL records.
 In the ASUS boards I've been playing with lately, the Sensor Number field
together with the Event Data field will (usually) tell you the DIMM slot,
once you know how to decode those fields for the specific motherboard (and
possibly firmware revisions?) that you have.

How do you get that motherboard-specific data?  By finding a DIMM that
reliably produces errors, and moving it from slot to slot, taking notes on
those two SEL fields above.  I've seen a similar thing work for Dell
machines too.

If you have Dell PowerEdge R or M boxes (or previous generation
equivalents), there are various nicer ways to get the name of the DIMM
involved, including using a version of ipmitool that has the 'delloem'
subcommand.

I second Tony's suggestion that RAM testers may not be as good as real
systems, for finding bad RAM.  My experience on one large system a few years
ago was that new DIMMs failed at a rate of around 1% per year, but
"refurbished" DIMMs from RMAs failed at 10% per year (or was it even higher?
I forget).  I was led to believe that these refurbished DIMMs were often
customer returns that had been run through a RAM tester and passed.  Turns
out sometimes the customers were right and the "refurbishment" process was
wrong.

One more thing about the ASUS boards I've been playing with lately: If you
get a panic on uncorrectable memory error, and power cycle the system (using
the power button, or by remote 'ipmitool ... power cycle'), the following
POST does not report the bad DIMM.  But if you *reset* the system (by
pushing the reset button with a paperclip, or by remote 'ipmitool ... power
reset'), the next POST will pause and tell you what CPU, Channel, and DIMM
was affected on that previous uncorrectable error, which is more info that
'ipmitool sel list' gives you.  It's then up to you to figure out how CPU,
Channel, and DIMM map to the silkscreened names on the motherboard -- I
couldn't find documentation, but it turned out to be the pattern we
suspected. :)

David

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20110820/52530b43/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From john.hearns at mclaren.com  Tue Aug 23 11:46:59 2011
From: john.hearns at mclaren.com (Hearns, John)
Date: Tue, 23 Aug 2011 16:46:59 +0100
Subject: [Beowulf] Flash storage arrays
Message-ID: <207BB2F60743C34496BE41039233A809071F88D8@MRL-PWEXCHMB02.mil.tagmclarengroup.com>

Does anyone have an opinion of these for CFD workloads:


http://www.theregister.co.uk/2011/08/23/pure_storage_fa_300/

the interesting thing is they claim is is cheaper than disk - but that's
a hard claim to assess
in an HPC context as it SEEMS to be only when their inbuild
deduplication is taken into account.
I'm not sure how much dedupe buys you with typical HPC data - ie large
files rather than lots
of nearly-identical emails or visrtual disk images.

John Hearns | CFD Hardware Specialist | McLaren Racing Limited
McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

T:  +44 (0) 1483 261000
D:  +44 (0) 1483 262352
F:  +44 (0) 1483 261010
E:  john.hearns at mclaren.com
W:  www.mclaren.com


The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Wed Aug 24 21:30:39 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 25 Aug 2011 03:30:39 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
Message-ID: <8BBC31C3-7F1A-433C-863A-5F0EBB4714AC@xs4all.nl>

In a world where you don't trust others, using MD5 is out of the  
question. It's not safe. It's possible to fake a MD5 sum
by modifying the number to whatever you wish (if it is enough random  
data) and then add something, with just a small correction
to the data to again get the md5sum that was posted on the website.

Vincent

On Aug 12, 2011, at 5:21 PM, David Mathog wrote:

> Robert G. Brown wrote:
>
>>  Everybody must be able to obtain it
>>> freely from a web connection.
>>
>>    http://www.random.org/
>>
>
> Nice site.  They have something that is very close, the pregenerated
> random files, from which a small set of digits may be extracted,  
> and the
> files themselves have MD5 checksums (but are not signed).
> They also support https.  It comes up a little short on criteria 1 (we
> really don't know what is going on behind the scenes) and 6 (it is a
> single site.)
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Wed Aug 24 21:58:52 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 25 Aug 2011 03:58:52 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108121416030.3145@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
Message-ID: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>


On Aug 12, 2011, at 8:35 PM, Robert G. Brown wrote:

> On Fri, 12 Aug 2011, David Mathog wrote:
>
>> Robert G. Brown wrote:
>>
>>>  Everybody must be able to obtain it
>>>> freely from a web connection.
>>>
>>>    http://www.random.org/
>>>
>>
>> Nice site.  They have something that is very close, the pregenerated
>> random files, from which a small set of digits may be extracted,  
>> and the
>> files themselves have MD5 checksums (but are not signed).
>> They also support https.  It comes up a little short on criteria 1  
>> (we
>> really don't know what is going on behind the scenes) and 6 (it is a
>> single site.)
>
> Behind the scenes is documented pretty well on the site, and the  
> guy who
> runs it is a human being, you can communicate with him to learn even
> more.  I already know him a bit, as he and I have collaborated on
> applying dieharder to test random.org datasets -- even "the"  
> random.org
> dataset as of some time ago (I have a few hundred MB of random number
> from the site in my dieharder directory).  IIRC, the numbers are
> generated continuously and fairly slowly by grabbing and filtering and
> transforming atmospheric noise.  As a source of entropy, that is
> probably excellent if (as noted) slow, but many good sources of  
> entropy
> seem to be fairly slow.  He has good reason to think that his numbers
> are theoretically "true random numbers"

Well there is another test i stumbled upon when i did do some  
analysis on
casino (which student who takes himself serious doesn't do an attempt to
write some simulations seeing whether you can win something in the  
casino
by designing some strategies?).

The simulation revealed it was rather easy to make a fortune with  
roulette
with the doubling system (first put in 1 then if you win, put in 1  
else double
and keep doublinguntil you win). Reports from guys (some of them missing
an eye, another one a hand) who actually study anything trying to make a
profit in casino's (and they also really try it in the casino's),  
revealed that
using the doubling system they never saw someone really make big profit
with it.

So there was a problem between the random generated data versus the
true random numbers generated in the casino.

Statistical analysis revealed the problem, though not so soon.

I noticed that most generated semi-random numbers with software  
generators,
had the habit to truely adress a search space of n always in O (n log  
n).

So if you draw from most software RNG's a number and do it modulo n,
with n being not too tiny, say quite some millions or even billions,  
then every
slot in your 'hashtable' will get hit at least once by the RNG,  
whereas data
in reality simply happens to not have that habit simply.

So true random numbers versus generated noise is in this manner easy
to distinguish by this. Now i didn't study literature whether some  
other chap
some long time ago already had invented this. That would be most  
interesting
to know.

In semi pseudo code, let's take an array of size a billion as an  
example,
though usually a few million is more than ok:

n = 2^30; // 2 to the power 30

Function TestNumbersForRandomness(RNG,n) {
   declare array hashtable[size n];

   guessednlogn = 2 * (log n / log 2) * n;

   for( i = 0 ; i < n ; i++ )
     hashtable[i] = FALSE;

   ndraws = filledn = 0;
   while( ndraws  < guessednlogn ) {
      randomnumber = RNG();
      r = randomnumber % n; //     randomnumber =  r  (mod n)
      if( hashtable[r] == FALSE ) {
         hashtable[r] = TRUE;
         filledn++;
         if( filledn >= n )
           break;

     }
     ndraws++;
   }

   if( filledn >= n )
      print "With high degree of certainty data generated by a RNG\n");
    else
      print "Not so sure it's a RNG\n";

}


Regards,
Vincent


> -- both unpredictable and
> flat/decorrelated at all orders, and even though there aren't really
> enough of them for my purposes, I've used them as one of the (small)
> "gold standard" sources for testing dieharder even as I test them.   
> For
> all practical purposes threefish or aes are truly random as well and
> they are a lot faster and easier to use as gold standard generators,
> though.
>
> I don't quite understand why the single site restriction is  
> important --
> this site has been up for years and I don't expect it to go away soon;
> it is quite reliable.  I don't think there is anything secret about  
> how
> the numbers are generated, and I'll certify that the numbers it  
> produces
> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
> your part; 6 I don't really understand but the guy who runs the  
> site is
> clearly willing to construct a custom feed for cash customers, if  
> there
> is enough value in whatever it is you are trying to do to pay for
> access.  If it's just a lottery, well, lord, I can think of a dozen  
> ways
> to make numbers so random that they'd be unimpeachable for any sort of
> lottery, both unpredictable and uncorrelated, and they don't any of  
> them
> require any significant amount of entropy to get started.
>
> I will add one warning -- "randomness" is a rather stringent
> mathematical criterion, and is generally tested against the null
> hypothesis.  Amateurs who want to make random number generators out of
> supposedly "random" data streams or fancy algorithms almost invariably
> fail, sometimes spectacularly so.  There are a half dozen or more
> really, really good pseudorandom number generators out there and it is
> easy to hotwire them together into an xor-based high entropy stream  
> that
> basically never repeats (feeding it a bit of real entropy now and then
> as it operates).  I would strongly counsel you against trying to take
> e.g. weather data and make something "random" out of it.  Unless you
> really know what you are doing, you will probably make something that
> isn't at all random and may not even be unpredictable.  Even most
> sources of "quantum" randomness (which is at least possibly "truly
> random", although I doubt it) aren't flat, so that they carry the
> signature of their generation process unless/until you manage to
> transform them into something flat (difficult unless you KNOW the
> distribution they are producing).  Pseudorandom number generators have
> the serious advantage of being amenable to at least some theoretical
> analysis (so you can "guarantee" flatness out to some high
> dimensionality, say) as well as empirical testing with e.g. dieharder.
>
> HTH,
>
>      rgb
>
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Thu Aug 25 08:11:07 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 25 Aug 2011 08:11:07 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108250743280.3079@lilith>

On Thu, 25 Aug 2011, Vincent Diepeveen wrote:

> I noticed that most generated semi-random numbers with software generators,
> had the habit to truely adress a search space of n always in O (n log n).
>
> So if you draw from most software RNG's a number and do it modulo n,
> with n being not too tiny, say quite some millions or even billions, then 
> every
> slot in your 'hashtable' will get hit at least once by the RNG, whereas data
> in reality simply happens to not have that habit simply.
>
> So true random numbers versus generated noise is in this manner easy
> to distinguish by this. Now i didn't study literature whether some other chap
> some long time ago already had invented this. That would be most interesting
> to know.

Some other chap named George Marsaglia (and to some extent another chap
named Donald Knuth) have already invented this.  A number of tests of
the tails of random number generators are already in dieharder.  All
"good" modern rngs pass these tests.

The Martingale betting system you are looking at is even older (at least
Marsaglia and Knuth are still alive).  It dates back to the 18th
century, and is well known to be flawed for a variety of reasons, not
the least of which is that gamblers don't have the infinite wealth
necessary to make this >>even<< a zero-sum strategy and casinos have
betting limits that de facto make it impossible to pursue the requisite
number of steps and in roulette in particular have 0 and/or 00 slots and
aren't zero-sum to begin with.  You can read a decent analysis of
outcomes based on the presumed binomial distribution of a zero-sum game
here:

   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29

Your test below is interesting, though.  The only real problems I can
see with actually using it in dieharder are:

   a) One would need a theoretical estimate of the distribution of
filling given n log n draws on an n-slotted table (for largish n).  That
is, for a perfect rng, what SHOULD the distribution of success/failure
be.

   b) One would then need the CDF for this distribution, to be able to
turn the results of N trials (of n log n pulls each) into a p-value
under the null hypothesis -- the probability of obtaining the particular
number of successes and failures presuming a perfectly random generator.

That way dieharder could apply it rigorously to its 70 or 80 embedded
rngs or to any user's outboard generator.  There probably is theoretical
statistical support for the PD and/or CDF -- you're analyzing the tails
of a poissonian process -- but finding it or doing it yourself (or
myself), aye, that's the rub.  One cannot just say "high degree of
certainty that it is an RNG" (by which one means that the rng in
question fails the test for randomness) in the test.  HOW high?  Perfect
rngs or perfectly random processes will sometimes fill your table, but
how often?  How can you differentiate an "accident" when one does from
an actual failure?  All of those questions require a more rigorous
theory and quantitative result embedded in a test that can be
systematically cranked up to more clearly resolve failures until they
are unambiguous, not marginal maybe yes maybe no.

I suspect that the failures this test would reveal are already more than
covered in dieharder, in particular by the bit distribution tests and
the monkey tests, but I'm not terribly happy with the monkey tests and
would be perfectly thrilled to have a simpler to compute test that
revealed precisely this sort of flaw, systematically.  And it doesn't
hurt at all to have partially or fully redundant tests as long as the
test themselves are rigorously valid.  If you can find or compute the
CDF for your test below, I'd be happy to wrap it up and add it to
dieharder, in other words.  One can always SIMULATE a CDF, of course,
but that requires a known good generator and sort of begs the question
if you don't think that e.g. AES or threefish or KISS are good
generators that would actually pass your test.

Even hardware/quantum sources of random bits are suspect -- they often
are generated by a process that leaves in the traces of an underlying
distribution.  I'm not convinced that >>any<< process in the real world
is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
description of a closed system is just as deterministic as the classical
one, and Master equation unpredictability on open subsets of a large
closed system reflects entropy/ignorance, not actual randomness (hence
Einstein's famous "doesn't play dice" remark).  But lots of this are
sufficiently random that one cannot detect any failure of randomness,
modern crypto class generators being a prime example.

    rgb

>
> In semi pseudo code, let's take an array of size a billion as an example,
> though usually a few million is more than ok:
>
> n = 2^30; // 2 to the power 30
>
> Function TestNumbersForRandomness(RNG,n) {
> declare array hashtable[size n];
>
> guessednlogn = 2 * (log n / log 2) * n;
>
> for( i = 0 ; i < n ; i++ )
>   hashtable[i] = FALSE;
>
> ndraws = filledn = 0;
> while( ndraws  < guessednlogn ) {
>    randomnumber = RNG();
>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>    if( hashtable[r] == FALSE ) {
>       hashtable[r] = TRUE;
>       filledn++;
>       if( filledn >= n )
>         break;
>
>   }
>   ndraws++;
> }
>
> if( filledn >= n )
>    print "With high degree of certainty data generated by a RNG\n");
>  else
>    print "Not so sure it's a RNG\n";
>
> }
>
>
>
>
>
> Regards,
> Vincent
>
>
>
>
>> -- both unpredictable and
>> flat/decorrelated at all orders, and even though there aren't really
>> enough of them for my purposes, I've used them as one of the (small)
>> "gold standard" sources for testing dieharder even as I test them.  For
>> all practical purposes threefish or aes are truly random as well and
>> they are a lot faster and easier to use as gold standard generators,
>> though.
>> 
>> I don't quite understand why the single site restriction is important --
>> this site has been up for years and I don't expect it to go away soon;
>> it is quite reliable.  I don't think there is anything secret about how
>> the numbers are generated, and I'll certify that the numbers it produces
>> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
>> your part; 6 I don't really understand but the guy who runs the site is
>> clearly willing to construct a custom feed for cash customers, if there
>> is enough value in whatever it is you are trying to do to pay for
>> access.  If it's just a lottery, well, lord, I can think of a dozen ways
>> to make numbers so random that they'd be unimpeachable for any sort of
>> lottery, both unpredictable and uncorrelated, and they don't any of them
>> require any significant amount of entropy to get started.
>> 
>> I will add one warning -- "randomness" is a rather stringent
>> mathematical criterion, and is generally tested against the null
>> hypothesis.  Amateurs who want to make random number generators out of
>> supposedly "random" data streams or fancy algorithms almost invariably
>> fail, sometimes spectacularly so.  There are a half dozen or more
>> really, really good pseudorandom number generators out there and it is
>> easy to hotwire them together into an xor-based high entropy stream that
>> basically never repeats (feeding it a bit of real entropy now and then
>> as it operates).  I would strongly counsel you against trying to take
>> e.g. weather data and make something "random" out of it.  Unless you
>> really know what you are doing, you will probably make something that
>> isn't at all random and may not even be unpredictable.  Even most
>> sources of "quantum" randomness (which is at least possibly "truly
>> random", although I doubt it) aren't flat, so that they carry the
>> signature of their generation process unless/until you manage to
>> transform them into something flat (difficult unless you KNOW the
>> distribution they are producing).  Pseudorandom number generators have
>> the serious advantage of being amenable to at least some theoretical
>> analysis (so you can "guarantee" flatness out to some high
>> dimensionality, say) as well as empirical testing with e.g. dieharder.
>> 
>> HTH,
>>
>>     rgb
>> 
>>> 
>>> Thanks,
>>> 
>>> David Mathog
>>> mathog at caltech.edu
>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> 
>> 
>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>> Duke University Dept. of Physics, Box 90305
>> Durham, N.C. 27708-0305
>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>> 
>> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Aug 25 21:55:04 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 03:55:04 +0200
Subject: [Beowulf] OT: Calculating Extraterrestrial Life - was public
	random numbers?
In-Reply-To: <alpine.LFD.2.02.1108250743280.3079@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
Message-ID: <BF6839DC-68D7-4169-A13C-F706909DEC7B@xs4all.nl>


On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:

> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>
>> I noticed that most generated semi-random numbers with software  
>> generators,
>> had the habit to truely adress a search space of n always in O (n  
>> log n).
>>
>> So if you draw from most software RNG's a number and do it modulo n,
>> with n being not too tiny, say quite some millions or even  
>> billions, then every
>> slot in your 'hashtable' will get hit at least once by the RNG,  
>> whereas data
>> in reality simply happens to not have that habit simply.
>>
>> So true random numbers versus generated noise is in this manner easy
>> to distinguish by this. Now i didn't study literature whether some  
>> other chap
>> some long time ago already had invented this. That would be most  
>> interesting
>> to know.
>
> Some other chap named George Marsaglia (and to some extent another  
> chap
> named Donald Knuth) have already invented this.  A number of tests of
> the tails of random number generators are already in dieharder.  All
> "good" modern rngs pass these tests.
>
> The Martingale betting system you are looking at is even older (at  
> least
> Marsaglia and Knuth are still alive).  It dates back to the 18th
> century, and is well known to be flawed for a variety of reasons, not
> the least of which is that gamblers don't have the infinite wealth
> necessary to make this >>even<< a zero-sum strategy and casinos have
> betting limits that de facto make it impossible to pursue the  
> requisite
> number of steps and in roulette in particular have 0 and/or 00  
> slots and
> aren't zero-sum to begin with.  You can read a decent analysis of
> outcomes based on the presumed binomial distribution of a zero-sum  
> game
> here:
>
>   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>
> Your test below is interesting, though.  The only real problems I can
> see with actually using it in dieharder are:
>
>   a) One would need a theoretical estimate of the distribution of
> filling given n log n draws on an n-slotted table (for largish n).   
> That
> is, for a perfect rng, what SHOULD the distribution of success/failure
> be.
>
>   b) One would then need the CDF for this distribution, to be able to
> turn the results of N trials (of n log n pulls each) into a p-value
> under the null hypothesis -- the probability of obtaining the  
> particular
> number of successes and failures presuming a perfectly random  
> generator.
>
> That way dieharder could apply it rigorously to its 70 or 80 embedded
> rngs or to any user's outboard generator.  There probably is  
> theoretical
> statistical support for the PD and/or CDF -- you're analyzing the  
> tails
> of a poissonian process -- but finding it or doing it yourself (or
> myself), aye, that's the rub.  One cannot just say "high degree of
> certainty that it is an RNG" (by which one means that the rng in
> question fails the test for randomness) in the test.  HOW high?   
> Perfect
> rngs or perfectly random processes will sometimes fill your table, but
> how often?  How can you differentiate an "accident" when one does from
> an actual failure?  All of those questions require a more rigorous
> theory and quantitative result embedded in a test that can be
> systematically cranked up to more clearly resolve failures until they
> are unambiguous, not marginal maybe yes maybe no.
>
> I suspect that the failures this test would reveal are already more  
> than
> covered in dieharder, in particular by the bit distribution tests and

Thanks for your kind words - you'll realize that, seeing all the  
theories you quote
below where i simply have few knowledge from and definitely not the  
time for
to investigate (yet), you're talking way above my level of knowledge  
there.

Instead of going deep into mathematical theories i would find it more  
appropriate
to ponder on the feasability to calculate the existance of  
extraterrestrial life.

Now i realize a lot of efforts go into recognizing messages from  
outer space.

Yet we can also speculate on some things.

First of all i'd like to make a statement on extra terrestrial life  
and a viewpoint there.
One viewpoint i've seen promoted is that some scientist(s) claim we  
should
hide ourselves for extraterrestrial life.

I fully disagree there to some extend. If there is extraterrestrial  
life that is more advanced than our
society, obviously they also could have build weapons to totally  
selfdestruct and would already
have killed themselves if they would have been agressive forms of life.

If they were succesful in reproducing themselves, just like humankind  
is right now,
they would have burned up all resources at their own planet, caused  
massive
extinctions. It is difficult to defend the statement that all mass  
extinctions were caused by meteorites only -
for such statement one would need a proof for every single mass  
extinction being caused by a specific
meteorite; the extinction could just have been that a certain  
succesful species dominated the planet a tad too much
and didn't get clever enough to selfcontrol nor selfregulate to an  
extend that the planet didn't entirely
die. After some millions of years life restores itself then on the  
planet.

So if there would be such an intelligent life elsewhere more advanced  
than our society
is, they sure would want to communicate in a manner that information  
could get read
through different galaxies.

However for the most primitive life forms that are succesful to  
dominate a planet one
would want to hide this information for until such society reaches a  
specific level.

One would only want intelligent life to decypher
such extraterrestrial form of communication by another extra  
terrestrial form of life,
where the form of life is of a sustainable peaceful level.

I would argue such a lifeform would not form a threat to anyone, as  
they already
have proven to not be a threat to their own planet. So if the  
knowledge of this
society is high enough to be able to control all that, one would also  
be able to argue
that belonging to that high level of development,
would belong a specific level of math. A level strong enough to  
decypher the form
of communication that gets used to communicate between the different  
very intelligent
lifeforms in existance through the galaxies.

 From the fact that there is not a systematic form of contact with  
extraterrestrial life
we can already deduce that humankind still has to develop itself  
further from a species
that burns up all its resources, especially causing too much output  
of CO2 (the latest
report i'll have to check out
is that the increased CO2 level increases the amount of CO2 absorbed  
by the
oceans causing it to get more sour, causing plankton, start of the  
foodchain, to
not develop its skeleton enough, which for sure in the long run will  
cause mass
extinction).

Now we might not be advanced enough yet to decypher extraterrestrial  
communication,
so i wonder whether we might be able to recognize somehow that there  
is information getting
communicated using a form of encryption that we simply cannot  
decypher yet, based upon
comparing it versus how our RNG's work. Some of them run for example  
over a primefield,
others have a distribution too perfect.

If we get from space radiation measurements back, and we test them  
for belonging in a specific
class or type of randomness versus non randomness; how does that  
compare with if we have a source
of radiation ourselves that's comparable to that and its randomness  
classification?

Obviously the algorithm i gave is just one specific form of algorithm  
to measure a perfect distribution -
as you already indicated there are many other tests invented already.

In how far have those been applied to what could be encrypted  
communication from extraterrestrial life
to other extraterrestrial life (like us if we manage to survive as  
species and develop further to a
peaceful level that can sustain itself for a longer period of time).

So summarized what i wonder about is in how random number theory can  
contribute to detecting
extraterrestrial life (of course with a specific statistical  
significance to it).

This of course in combination with experiments conducted that allow  
us to first classify how a specific form of
possible communication system would behave normally spoken according  
to the randomness classification system,
versus the classification on how the measured possible form of  
communication compares to that.

Such classification system would need to be very sophisticated to  
have any chance of detecing extraterrestrial life
i'd guess, as we can't just naively assume that all they could come  
up with is encrypting things over a primefield using
smallish primes which in our world already only is allowed to be used  
upto secret level.

Regards,
Vincent

> the monkey tests, but I'm not terribly happy with the monkey tests and
> would be perfectly thrilled to have a simpler to compute test that
> revealed precisely this sort of flaw, systematically.  And it doesn't
> hurt at all to have partially or fully redundant tests as long as the
> test themselves are rigorously valid.  If you can find or compute the
> CDF for your test below, I'd be happy to wrap it up and add it to
> dieharder, in other words.  One can always SIMULATE a CDF, of course,
> but that requires a known good generator and sort of begs the question
> if you don't think that e.g. AES or threefish or KISS are good
> generators that would actually pass your test.
>
> Even hardware/quantum sources of random bits are suspect -- they often
> are generated by a process that leaves in the traces of an underlying
> distribution.  I'm not convinced that >>any<< process in the real  
> world
> is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
> description of a closed system is just as deterministic as the  
> classical
> one, and Master equation unpredictability on open subsets of a large
> closed system reflects entropy/ignorance, not actual randomness (hence
> Einstein's famous "doesn't play dice" remark).  But lots of this are
> sufficiently random that one cannot detect any failure of randomness,
> modern crypto class generators being a prime example.
>
>    rgb
>
>>
>> In semi pseudo code, let's take an array of size a billion as an  
>> example,
>> though usually a few million is more than ok:
>>
>> n = 2^30; // 2 to the power 30
>>
>> Function TestNumbersForRandomness(RNG,n) {
>> declare array hashtable[size n];
>>
>> guessednlogn = 2 * (log n / log 2) * n;
>>
>> for( i = 0 ; i < n ; i++ )
>>   hashtable[i] = FALSE;
>>
>> ndraws = filledn = 0;
>> while( ndraws  < guessednlogn ) {
>>    randomnumber = RNG();
>>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>>    if( hashtable[r] == FALSE ) {
>>       hashtable[r] = TRUE;
>>       filledn++;
>>       if( filledn >= n )
>>         break;
>>
>>   }
>>   ndraws++;
>> }
>>
>> if( filledn >= n )
>>    print "With high degree of certainty data generated by a RNG\n");
>>  else
>>    print "Not so sure it's a RNG\n";
>>
>> }
>>
>>
>>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>
>>
>>> -- both unpredictable and
>>> flat/decorrelated at all orders, and even though there aren't really
>>> enough of them for my purposes, I've used them as one of the (small)
>>> "gold standard" sources for testing dieharder even as I test  
>>> them.  For
>>> all practical purposes threefish or aes are truly random as well and
>>> they are a lot faster and easier to use as gold standard generators,
>>> though.
>>> I don't quite understand why the single site restriction is  
>>> important --
>>> this site has been up for years and I don't expect it to go away  
>>> soon;
>>> it is quite reliable.  I don't think there is anything secret  
>>> about how
>>> the numbers are generated, and I'll certify that the numbers it  
>>> produces
>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>> effort on
>>> your part; 6 I don't really understand but the guy who runs the  
>>> site is
>>> clearly willing to construct a custom feed for cash customers, if  
>>> there
>>> is enough value in whatever it is you are trying to do to pay for
>>> access.  If it's just a lottery, well, lord, I can think of a  
>>> dozen ways
>>> to make numbers so random that they'd be unimpeachable for any  
>>> sort of
>>> lottery, both unpredictable and uncorrelated, and they don't any  
>>> of them
>>> require any significant amount of entropy to get started.
>>> I will add one warning -- "randomness" is a rather stringent
>>> mathematical criterion, and is generally tested against the null
>>> hypothesis.  Amateurs who want to make random number generators  
>>> out of
>>> supposedly "random" data streams or fancy algorithms almost  
>>> invariably
>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>> really, really good pseudorandom number generators out there and  
>>> it is
>>> easy to hotwire them together into an xor-based high entropy  
>>> stream that
>>> basically never repeats (feeding it a bit of real entropy now and  
>>> then
>>> as it operates).  I would strongly counsel you against trying to  
>>> take
>>> e.g. weather data and make something "random" out of it.  Unless you
>>> really know what you are doing, you will probably make something  
>>> that
>>> isn't at all random and may not even be unpredictable.  Even most
>>> sources of "quantum" randomness (which is at least possibly "truly
>>> random", although I doubt it) aren't flat, so that they carry the
>>> signature of their generation process unless/until you manage to
>>> transform them into something flat (difficult unless you KNOW the
>>> distribution they are producing).  Pseudorandom number generators  
>>> have
>>> the serious advantage of being amenable to at least some theoretical
>>> analysis (so you can "guarantee" flatness out to some high
>>> dimensionality, say) as well as empirical testing with e.g.  
>>> dieharder.
>>> HTH,
>>>
>>>     rgb
>>>> Thanks,
>>>> David Mathog
>>>> mathog at caltech.edu
>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit  
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Thu Aug 25 20:27:18 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 02:27:18 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108250743280.3079@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
Message-ID: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>


On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:

> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>
>> I noticed that most generated semi-random numbers with software  
>> generators,
>> had the habit to truely adress a search space of n always in O (n  
>> log n).
>>
>> So if you draw from most software RNG's a number and do it modulo n,
>> with n being not too tiny, say quite some millions or even  
>> billions, then every
>> slot in your 'hashtable' will get hit at least once by the RNG,  
>> whereas data
>> in reality simply happens to not have that habit simply.
>>
>> So true random numbers versus generated noise is in this manner easy
>> to distinguish by this. Now i didn't study literature whether some  
>> other chap
>> some long time ago already had invented this. That would be most  
>> interesting
>> to know.
>
> Some other chap named George Marsaglia (and to some extent another  
> chap
> named Donald Knuth) have already invented this.  A number of tests of
> the tails of random number generators are already in dieharder.  All
> "good" modern rngs pass these tests.
>
> The Martingale betting system you are looking at is even older (at  
> least
> Marsaglia and Knuth are still alive).  It dates back to the 18th
> century, and is well known to be flawed for a variety of reasons, not
> the least of which is that gamblers don't have the infinite wealth
> necessary to make this >>even<< a zero-sum strategy and casinos have

 From mathematical viewpoint it makes perfect cash.
As statistica odds is you already have build up considerable profit
when a worst case (that you hit the 10 times practical double limit)
hits you.

The simulations are of course using the practical limit.

Note that the European casino's have a single zero.
In USA there is even more greedy mafia controlling all the casino's,
there are 2 zero's there. 0 and 00.

The simulations were for European casino's.

> betting limits that de facto make it impossible to pursue the  
> requisite
> number of steps and in roulette in particular have 0 and/or 00  
> slots and
> aren't zero-sum to begin with.  You can read a decent analysis of
> outcomes based on the presumed binomial distribution of a zero-sum  
> game
> here:
>
>   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>

You're not allowed to use a system in a casino, so we speak about
theory. Probably first evening they let you try. Second day you'll  
get on the blacklist.

> Your test below is interesting, though.  The only real problems I can
> see with actually using it in dieharder are:
>

Yeah more interesting than the billion times discussed roulette  
system which
has been analyzed completely flat.

>   a) One would need a theoretical estimate of the distribution of
> filling given n log n draws on an n-slotted table (for largish n).   
> That
> is, for a perfect rng, what SHOULD the distribution of success/failure
> be.

As we figured out by now in Artificial Intelligence the statistical
assumptions made in the past they simply do not hold.

For Artificial Intelligence we need a new sort of theoretical theory.

As for the distribution problem, generatiors having a spread that's  
too accurate,
the way to deliver a proof would be for example build a simple device.

Build an old fashioned box where you can draw balls. Remember what  
you coud
see on TV some 20 years ago or so (not sure it was like that in USA).

A big basked with balls. The basket, in fact it's looking like this:

http://www.rateyours.com/blog/uploaded_images/lottery_machine-727064.jpg

But now a much bigger machine like this with inside different means  
of randomizing the balls,
actually also randomly modifying the inside  obstacles of shaking of  
the balls.

After a ball has been drawn you automatically have it annotated and  
the ball immediately goes back
into the machine. For a full minute you have the balls in the machine  
shaken again and you draw
again a ball. It is important to do this randomizing of the balls  
inside the machine for quite some time.
I would propose a minute.

Of course you have to do this with quite some balls.  Say a thousand.

Then you draw balls until all numbers have been drawn at least once.

This cool experiment can be easily build. Of course the expected  
running time of a single experiment
will be a few weeks.

You can produce a number of those drawing machines though and have a  
look.

Theories that seemingly work for small n, n being the number of balls,
are much harder to maintain at bigger n's, as we also see in prime  
number research.

The way how the machine gets designed of course is total crucial. I  
would propose a design that
really shakes the balls really a lot through each other and really  
very thoroughly.

Just like we nowadays know how flawed a big number of card shaking  
machines are that are popular to use.

Such a lottery with realy a lot of balls would be very interesting to  
see the outcomes from.

In fact i would prefer having produced number of those machines, so  
that it's possible to really have a lot of outcomes
and then analyze them very well.

>
>   b) One would then need the CDF for this distribution, to be able to
> turn the results of N trials (of n log n pulls each) into a p-value
> under the null hypothesis -- the probability of obtaining the  
> particular
> number of successes and failures presuming a perfectly random  
> generator.
>
> That way dieharder could apply it rigorously to its 70 or 80 embedded
> rngs or to any user's outboard generator.  There probably is  
> theoretical
> statistical support for the PD and/or CDF -- you're analyzing the  
> tails
> of a poissonian process -- but finding it or doing it yourself (or
> myself), aye, that's the rub.  One cannot just say "high degree of
> certainty that it is an RNG" (by which one means that the rng in
> question fails the test for randomness) in the test.  HOW high?   
> Perfect
> rngs or perfectly random processes will sometimes fill your table, but
> how often?

If we assume that reality of life represents randomness, which is  
another
rather good question in how far that theory is plausible, then using  
that
assumption i'm very sure that the RNG's i investigated so far
have a distribution which is too perfect, more perfect than i have seen
in any reality.

In fact most RNG's fill all slots faster than O ( n log n ), yet it's  
O ( n log n )
that they follow.

This is RNG's that have come through all tests as being a good and
very acceptabe RNG to be used.

Realize i'm no RNG expert, so all the names of all those tests.

For me it's just push button technology. I just designed a test
and found it very odd that all RNG's have such perfect distributions
that they don't even miss a single slot.

I'd argue the only test that would be interesting to me to see how it
might be in reality is the lottery machine test - yet with really a lot
of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical
reasons i would agree with a number of above a 1000.

Paper fiddling is really not interesting to me there to prove anything,
as what i've seen in reality in randomness is total different from how
RNG's model that.

Regards,
Vincent


>   How can you differentiate an "accident" when one does from
> an actual failure?  All of those questions require a more rigorous
> theory and quantitative result embedded in a test that can be
> systematically cranked up to more clearly resolve failures until they
> are unambiguous, not marginal maybe yes maybe no.
>
> I suspect that the failures this test would reveal are already more  
> than
> covered in dieharder, in particular by the bit distribution tests and
> the monkey tests, but I'm not terribly happy with the monkey tests and
> would be perfectly thrilled to have a simpler to compute test that
> revealed precisely this sort of flaw, systematically.  And it doesn't
> hurt at all to have partially or fully redundant tests as long as the
> test themselves are rigorously valid.  If you can find or compute the
> CDF for your test below, I'd be happy to wrap it up and add it to
> dieharder, in other words.  One can always SIMULATE a CDF, of course,
> but that requires a known good generator and sort of begs the question
> if you don't think that e.g. AES or threefish or KISS are good
> generators that would actually pass your test.
>
> Even hardware/quantum sources of random bits are suspect -- they often
> are generated by a process that leaves in the traces of an underlying
> distribution.  I'm not convinced that >>any<< process in the real  
> world
> is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
> description of a closed system is just as deterministic as the  
> classical
> one, and Master equation unpredictability on open subsets of a large
> closed system reflects entropy/ignorance, not actual randomness (hence
> Einstein's famous "doesn't play dice" remark).  But lots of this are
> sufficiently random that one cannot detect any failure of randomness,
> modern crypto class generators being a prime example.
>
>    rgb
>
>>
>> In semi pseudo code, let's take an array of size a billion as an  
>> example,
>> though usually a few million is more than ok:
>>
>> n = 2^30; // 2 to the power 30
>>
>> Function TestNumbersForRandomness(RNG,n) {
>> declare array hashtable[size n];
>>
>> guessednlogn = 2 * (log n / log 2) * n;
>>
>> for( i = 0 ; i < n ; i++ )
>>   hashtable[i] = FALSE;
>>
>> ndraws = filledn = 0;
>> while( ndraws  < guessednlogn ) {
>>    randomnumber = RNG();
>>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>>    if( hashtable[r] == FALSE ) {
>>       hashtable[r] = TRUE;
>>       filledn++;
>>       if( filledn >= n )
>>         break;
>>
>>   }
>>   ndraws++;
>> }
>>
>> if( filledn >= n )
>>    print "With high degree of certainty data generated by a RNG\n");
>>  else
>>    print "Not so sure it's a RNG\n";
>>
>> }
>>
>>
>>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>
>>
>>> -- both unpredictable and
>>> flat/decorrelated at all orders, and even though there aren't really
>>> enough of them for my purposes, I've used them as one of the (small)
>>> "gold standard" sources for testing dieharder even as I test  
>>> them.  For
>>> all practical purposes threefish or aes are truly random as well and
>>> they are a lot faster and easier to use as gold standard generators,
>>> though.
>>> I don't quite understand why the single site restriction is  
>>> important --
>>> this site has been up for years and I don't expect it to go away  
>>> soon;
>>> it is quite reliable.  I don't think there is anything secret  
>>> about how
>>> the numbers are generated, and I'll certify that the numbers it  
>>> produces
>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>> effort on
>>> your part; 6 I don't really understand but the guy who runs the  
>>> site is
>>> clearly willing to construct a custom feed for cash customers, if  
>>> there
>>> is enough value in whatever it is you are trying to do to pay for
>>> access.  If it's just a lottery, well, lord, I can think of a  
>>> dozen ways
>>> to make numbers so random that they'd be unimpeachable for any  
>>> sort of
>>> lottery, both unpredictable and uncorrelated, and they don't any  
>>> of them
>>> require any significant amount of entropy to get started.
>>> I will add one warning -- "randomness" is a rather stringent
>>> mathematical criterion, and is generally tested against the null
>>> hypothesis.  Amateurs who want to make random number generators  
>>> out of
>>> supposedly "random" data streams or fancy algorithms almost  
>>> invariably
>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>> really, really good pseudorandom number generators out there and  
>>> it is
>>> easy to hotwire them together into an xor-based high entropy  
>>> stream that
>>> basically never repeats (feeding it a bit of real entropy now and  
>>> then
>>> as it operates).  I would strongly counsel you against trying to  
>>> take
>>> e.g. weather data and make something "random" out of it.  Unless you
>>> really know what you are doing, you will probably make something  
>>> that
>>> isn't at all random and may not even be unpredictable.  Even most
>>> sources of "quantum" randomness (which is at least possibly "truly
>>> random", although I doubt it) aren't flat, so that they carry the
>>> signature of their generation process unless/until you manage to
>>> transform them into something flat (difficult unless you KNOW the
>>> distribution they are producing).  Pseudorandom number generators  
>>> have
>>> the serious advantage of being amenable to at least some theoretical
>>> analysis (so you can "guarantee" flatness out to some high
>>> dimensionality, say) as well as empirical testing with e.g.  
>>> dieharder.
>>> HTH,
>>>
>>>     rgb
>>>> Thanks,
>>>> David Mathog
>>>> mathog at caltech.edu
>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit  
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 02:07:17 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 02:07:17 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108260127470.3126@lilith>

On Fri, 26 Aug 2011, Vincent Diepeveen wrote:

> If we assume that reality of life represents randomness, which is another
> rather good question in how far that theory is plausible, then using that
> assumption i'm very sure that the RNG's i investigated so far
> have a distribution which is too perfect, more perfect than i have seen
> in any reality.

That's because you live in a different reality than everybody else,
Vincent.

> In fact most RNG's fill all slots faster than O ( n log n ), yet it's O ( n 
> log n )
> that they follow.

In fact, they don't.

> This is RNG's that have come through all tests as being a good and
> very acceptabe RNG to be used.

No, it's not.

> Realize i'm no RNG expert, so all the names of all those tests.
>
> For me it's just push button technology. I just designed a test
> and found it very odd that all RNG's have such perfect distributions
> that they don't even miss a single slot.

It's odd because your test is broken.

>
> I'd argue the only test that would be interesting to me to see how it
> might be in reality is the lottery machine test - yet with really a lot
> of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical
> reasons i would agree with a number of above a 1000.
>
> Paper fiddling is really not interesting to me there to prove anything,
> as what i've seen in reality in randomness is total different from how
> RNG's model that.

Let's try a bit of "paper fiddling".  The expected number of filled slots
is (this is actual code, not pseudocode, for n slots):

  nlogn = log10(n)*n;
  expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn));

The reasoning is enormously simple.  The probability of a slot being
empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 -
1/n)^nlogn.  The probability of a slot being filled is thus 1 - p_e, and
given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within
random noise, n*(1-1/n)^nlogn of them "should" be empty.

Well, I've got a random number generator tester harness, so I hacked
your test into it. One major bug in your code, BTW, is using a modulus
to generate your random numbers -- dunno what that's about, but if your
rng returned numbers between (say) 0 and 7 and you use it to generate
numbers in the range 0 to 5 by means of r%5 then you'll get (for the
sequence of numbers) 0 1 2 3 4 0 1 2.  Note well that you get twice as
many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7.
So you aren't even testing a uniformly distributed sequence of integers.

Fixing this relatively minor bug, removing your breakout and actually
counting up filledn for the full nlogn samples, and applying the test to
mt19937, we get:

rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9990811, expected = 9990881

We run it again:
rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9990802, expected = 9990881

We run it for R250 -- a well-known not-good generator:
rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9990794, expected = 9990881

We run it on the literally infamous randu:
n = 10000000
nlogn = 70000000
table not all filled: filledn = 9999482, expected = 9990881

Note, Vincent, that the last two examples of correctly computed results
from known-terrible generators are much farther from the expected mean
than mt19937, a well-known damn good one.  This suggests that your test
(perhaps unsurprisingly) has some sensitivity, not because some slots
are or aren't empty, but because the NUMBER of slots that are or aren't
empty isn't quite correct.  Note also that in the "paper fiddling"
analysis above, the use of nlogn is quite unimportant -- we could make
this an independent variable and evaluate the table filling for any
value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m.

If I have the energy, I'll see if the distribution of filledn around
expected is e.g. Gaussian -- it seems pretty reasonable that it would be
-- with some expected or empirically computable variance.  If it is,
then this can be fairly easily turned into an actual test that returns a
p-value that humans can use to make rational judgements, or rational
humans can use to make judgements or something like that.  I doubt that
the test will have MUCH sensitivity -- modern generators are way too
good to have their flaws picked out quite this simply, although
Marsaglia's "monkey tests" do something very similar although a lot more
sophisticated mathematically (and arguably more sensitive) and do
suffice to nail randu (anything nails randu) and semi-weak tests like
R250.

Now, let's see what we've learned from this fiddling.  One is that
without it, you just waste a lot of people's time making egregious and
false claims that belittle the tremendously sophisticated and difficult
work a whole lot of "fiddlers" have put into inventing, writing, and
testing modern RNGs.  The truth is that >>all<< RNGs in dieharder "pass"
your test (if the test is "producing at least one zero") once your test
isn't broken.  We've learned that in fact, the best of the modern RNGs
are damn good, and that you could work for five years trying to invent a
test that is good enough to fail any of them and still not succeed.
Finally, we've learned that you should not, not, not take your
Martingale to a casino and try the doubling strategy out to make money,
or if you do put a firm upper bound -- something like 63 Euro -- on what
you're willing to lose with your base stake of 1 Euro.  That way you
have maybe a 40% chance of doubling your 63 Euro before you go broke.
Really, you should read the Wikipedia article I linked, in spite of the
fact that it presents more "paper fiddling".

   Sincerely,

       rgb

(See P.S. comments below...)

>>> n = 2^30; // 2 to the power 30
>>> 
>>> Function TestNumbersForRandomness(RNG,n) {
>>> declare array hashtable[size n];
>>> 
>>> guessednlogn = 2 * (log n / log 2) * n;

Why guess nlogn?  nlog is n*log10(n).  Why nlogn anyway?  Call it m and
make it a parameter.

>>> for( i = 0 ; i < n ; i++ )
>>>  hashtable[i] = FALSE;
>>> 
>>> ndraws = filledn = 0;
>>> while( ndraws  < guessednlogn ) {
>>>   randomnumber = RNG();
>>>   r = randomnumber % n; //     randomnumber =  r  (mod n)

no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX.  Yes
there are roundoff errors, but they are uniform and consistent and as
you can see, don't affect this problem.  What you have isn't even close
to uniform -- it is badly nonrandom.

>>>   if( hashtable[r] == FALSE ) {
>>>      hashtable[r] = TRUE;
>>>      filledn++;

>>>      if( filledn >= n )
>>>        break;

Don't break.  Just count up filledn.  It will never be more than n now
anyway, for any n, or any reasonable m. There probably is some number of
pulls that will raise "expected" to n, but it is pretty big compared to
n, way bigger than nlogn.

>>>
>>>  }
>>>  ndraws++;
>>> }
>>> 
>>> if( filledn >= n )
>>>   print "With high degree of certainty data generated by a RNG\n");
>>> else
>>>   print "Not so sure it's a RNG\n";
>>> 
>>> }

I'm guessing the correct statistic here is something like |expected -
filledn|/expected, but as I said, I haven't really worked at it.  I
haven't decided whether or not it is worth adding this to dieharder --
without a formal derivation of the expected statistic it would be yet
another empirical test, which means you're really comparing one RNG to
another presumed better one, which I don't like.  And do I have time to
do the "fiddling" needed to do a proper derivation?  Aye, that's the
rub...;-)

    rgb

>>> 
>>> 
>>> 
>>> 
>>> 
>>> Regards,
>>> Vincent
>>> 
>>> 
>>> 
>>> 
>>>> -- both unpredictable and
>>>> flat/decorrelated at all orders, and even though there aren't really
>>>> enough of them for my purposes, I've used them as one of the (small)
>>>> "gold standard" sources for testing dieharder even as I test them.  For
>>>> all practical purposes threefish or aes are truly random as well and
>>>> they are a lot faster and easier to use as gold standard generators,
>>>> though.
>>>> I don't quite understand why the single site restriction is important --
>>>> this site has been up for years and I don't expect it to go away soon;
>>>> it is quite reliable.  I don't think there is anything secret about how
>>>> the numbers are generated, and I'll certify that the numbers it produces
>>>> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
>>>> your part; 6 I don't really understand but the guy who runs the site is
>>>> clearly willing to construct a custom feed for cash customers, if there
>>>> is enough value in whatever it is you are trying to do to pay for
>>>> access.  If it's just a lottery, well, lord, I can think of a dozen ways
>>>> to make numbers so random that they'd be unimpeachable for any sort of
>>>> lottery, both unpredictable and uncorrelated, and they don't any of them
>>>> require any significant amount of entropy to get started.
>>>> I will add one warning -- "randomness" is a rather stringent
>>>> mathematical criterion, and is generally tested against the null
>>>> hypothesis.  Amateurs who want to make random number generators out of
>>>> supposedly "random" data streams or fancy algorithms almost invariably
>>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>>> really, really good pseudorandom number generators out there and it is
>>>> easy to hotwire them together into an xor-based high entropy stream that
>>>> basically never repeats (feeding it a bit of real entropy now and then
>>>> as it operates).  I would strongly counsel you against trying to take
>>>> e.g. weather data and make something "random" out of it.  Unless you
>>>> really know what you are doing, you will probably make something that
>>>> isn't at all random and may not even be unpredictable.  Even most
>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>> random", although I doubt it) aren't flat, so that they carry the
>>>> signature of their generation process unless/until you manage to
>>>> transform them into something flat (difficult unless you KNOW the
>>>> distribution they are producing).  Pseudorandom number generators have
>>>> the serious advantage of being amenable to at least some theoretical
>>>> analysis (so you can "guarantee" flatness out to some high
>>>> dimensionality, say) as well as empirical testing with e.g. dieharder.
>>>> HTH,
>>>>
>>>>    rgb
>>>>> Thanks,
>>>>> David Mathog
>>>>> mathog at caltech.edu
>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>>> Duke University Dept. of Physics, Box 90305
>>>> Durham, N.C. 27708-0305
>>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit 
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> 
>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>> Duke University Dept. of Physics, Box 90305
>> Durham, N.C. 27708-0305
>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>> 
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Fri Aug 26 07:56:15 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 13:56:15 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108260127470.3126@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
Message-ID: <B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>


On Aug 26, 2011, at 8:07 AM, Robert G. Brown wrote:

> On Fri, 26 Aug 2011, Vincent Diepeveen wrote:
>
>> If we assume that reality of life represents randomness, which is  
>> another
>> rather good question in how far that theory is plausible, then  
>> using that
>> assumption i'm very sure that the RNG's i investigated so far
>> have a distribution which is too perfect, more perfect than i have  
>> seen
>> in any reality.
>
> That's because you live in a different reality than everybody else,
> Vincent.

Or reality we live in might not be so random as we all guess...

But it's good that you took a look at the die-harder test now - which  
you didn't do before.

>
>> In fact most RNG's fill all slots faster than O ( n log n ), yet  
>> it's O ( n log n )
>> that they follow.
>
> In fact, they don't.
>
>> This is RNG's that have come through all tests as being a good and
>> very acceptabe RNG to be used.
>
> No, it's not.
>
>> Realize i'm no RNG expert, so all the names of all those tests.
>>
>> For me it's just push button technology. I just designed a test
>> and found it very odd that all RNG's have such perfect distributions
>> that they don't even miss a single slot.
>
> It's odd because your test is broken.
>
>>
>> I'd argue the only test that would be interesting to me to see how it
>> might be in reality is the lottery machine test - yet with really  
>> a lot
>> of balls. I'd prefer 10k balls over a 1000 in fact - yet for  
>> practical
>> reasons i would agree with a number of above a 1000.
>>
>> Paper fiddling is really not interesting to me there to prove  
>> anything,
>> as what i've seen in reality in randomness is total different from  
>> how
>> RNG's model that.
>
> Let's try a bit of "paper fiddling".  The expected number of filled  
> slots
> is (this is actual code, not pseudocode, for n slots):
>
>  nlogn = log10(n)*n;
>  expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn));
>
> The reasoning is enormously simple.  The probability of a slot being
> empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 -
> 1/n)^nlogn.  The probability of a slot being filled is thus 1 -  
> p_e, and
> given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within
> random noise, n*(1-1/n)^nlogn of them "should" be empty.
>
> Well, I've got a random number generator tester harness, so I hacked
> your test into it. One major bug in your code, BTW, is using a modulus
> to generate your random numbers -- dunno what that's about, but if  
> your

EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM.

Apologies for the caps. I hope how important this is. You're claiming  
all programmers
use random numbers in a faulty manner?

This is important enough to further discuss about it.

As nearly always you need random numbers from within a given domain  
say 0.. n-1
So projecting a RNG onto that domain is pretty crucial. How would you  
want to do that in a correct manner?

In the slot test in fact a simple AND is enough.

> rng returned numbers between (say) 0 and 7 and you use it to generate
> numbers in the range 0 to 5 by means of r%5 then you'll get (for the
> sequence of numbers) 0 1 2 3 4 0 1 2.  Note well that you get twice as
> many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7.
> So you aren't even testing a uniformly distributed sequence of  
> integers.
>
> Fixing this relatively minor bug, removing your breakout and actually
> counting up filledn for the full nlogn samples, and applying the  
> test to
> mt19937, we get:
>
> rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9990811, expected = 9990881
>
> We run it again:
> rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9990802, expected = 9990881
>
> We run it for R250 -- a well-known not-good generator:
> rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9990794, expected = 9990881
>
> We run it on the literally infamous randu:
> n = 10000000
> nlogn = 70000000
> table not all filled: filledn = 9999482, expected = 9990881
>
> Note, Vincent, that the last two examples of correctly computed  
> results
> from known-terrible generators are much farther from the expected mean
> than mt19937, a well-known damn good one.  This suggests that your  
> test
> (perhaps unsurprisingly) has some sensitivity, not because some slots
> are or aren't empty, but because the NUMBER of slots that are or  
> aren't
> empty isn't quite correct.  Note also that in the "paper fiddling"
> analysis above, the use of nlogn is quite unimportant -- we could make
> this an independent variable and evaluate the table filling for any
> value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m.
>
> If I have the energy, I'll see if the distribution of filledn around
> expected is e.g. Gaussian -- it seems pretty reasonable that it  
> would be
> -- with some expected or empirically computable variance.  If it is,
> then this can be fairly easily turned into an actual test that  
> returns a
> p-value that humans can use to make rational judgements, or rational
> humans can use to make judgements or something like that.  I doubt  
> that
> the test will have MUCH sensitivity -- modern generators are way too
> good to have their flaws picked out quite this simply, although
> Marsaglia's "monkey tests" do something very similar although a lot  
> more
> sophisticated mathematically (and arguably more sensitive) and do
> suffice to nail randu (anything nails randu) and semi-weak tests like
> R250.
>
> Now, let's see what we've learned from this fiddling.  One is that
> without it, you just waste a lot of people's time making egregious and
> false claims that belittle the tremendously sophisticated and  
> difficult
> work a whole lot of "fiddlers" have put into inventing, writing, and
> testing modern RNGs.  The truth is that >>all<< RNGs in dieharder  
> "pass"
> your test (if the test is "producing at least one zero") once your  
> test
> isn't broken.  We've learned that in fact, the best of the modern RNGs
> are damn good, and that you could work for five years trying to  
> invent a
> test that is good enough to fail any of them and still not succeed.
> Finally, we've learned that you should not, not, not take your
> Martingale to a casino and try the doubling strategy out to make  
> money,

It's not interesting to discuss - but yes this strategy makes money  
in casino's,
you just get thrown out of the casino and end up at the blacklist if  
you do.

For good chessplayers all this is not so tough. The casino's  
blacklist of people
too strong in blackjack is endless... ...this is practice for long  
than we live now...

So Casino reality is much simpler. They kick you out if you're good.

That's why they try to popularize poker now - you don't play against  
the casino there.


> or if you do put a firm upper bound -- something like 63 Euro -- on  
> what
> you're willing to lose with your base stake of 1 Euro.  That way you
> have maybe a 40% chance of doubling your 63 Euro before you go broke.
> Really, you should read the Wikipedia article I linked, in spite of  
> the
> fact that it presents more "paper fiddling".
>


>   Sincerely,
>
>       rgb
>
> (See P.S. comments below...)
>
>>>> n = 2^30; // 2 to the power 30
>>>> Function TestNumbersForRandomness(RNG,n) {
>>>> declare array hashtable[size n];
>>>> guessednlogn = 2 * (log n / log 2) * n;
>
> Why guess nlogn?  nlog is n*log10(n).  Why nlogn anyway?  Call it m  
> and
> make it a parameter.
>
>>>> for( i = 0 ; i < n ; i++ )
>>>>  hashtable[i] = FALSE;
>>>> ndraws = filledn = 0;
>>>> while( ndraws  < guessednlogn ) {
>>>>   randomnumber = RNG();
>>>>   r = randomnumber % n; //     randomnumber =  r  (mod n)
>
> no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX.   
> Yes
> there are roundoff errors, but they are uniform and consistent and as
> you can see, don't affect this problem.  What you have isn't even  
> close
> to uniform -- it is badly nonrandom.
>
>>>>   if( hashtable[r] == FALSE ) {
>>>>      hashtable[r] = TRUE;
>>>>      filledn++;
>
>>>>      if( filledn >= n )
>>>>        break;
>
> Don't break.  Just count up filledn.  It will never be more than n now
> anyway, for any n, or any reasonable m. There probably is some  
> number of
> pulls that will raise "expected" to n, but it is pretty big  
> compared to
> n, way bigger than nlogn.
>
>>>>
>>>>  }
>>>>  ndraws++;
>>>> }
>>>> if( filledn >= n )
>>>>   print "With high degree of certainty data generated by a RNG\n");
>>>> else
>>>>   print "Not so sure it's a RNG\n";
>>>> }
>
> I'm guessing the correct statistic here is something like |expected -
> filledn|/expected, but as I said, I haven't really worked at it.  I
> haven't decided whether or not it is worth adding this to dieharder --
> without a formal derivation of the expected statistic it would be yet
> another empirical test, which means you're really comparing one RNG to
> another presumed better one, which I don't like.  And do I have  
> time to
> do the "fiddling" needed to do a proper derivation?  Aye, that's the
> rub...;-)
>
>    rgb
>
>>>> Regards,
>>>> Vincent
>>>>> -- both unpredictable and
>>>>> flat/decorrelated at all orders, and even though there aren't  
>>>>> really
>>>>> enough of them for my purposes, I've used them as one of the  
>>>>> (small)
>>>>> "gold standard" sources for testing dieharder even as I test  
>>>>> them.  For
>>>>> all practical purposes threefish or aes are truly random as  
>>>>> well and
>>>>> they are a lot faster and easier to use as gold standard  
>>>>> generators,
>>>>> though.
>>>>> I don't quite understand why the single site restriction is  
>>>>> important --
>>>>> this site has been up for years and I don't expect it to go  
>>>>> away soon;
>>>>> it is quite reliable.  I don't think there is anything secret  
>>>>> about how
>>>>> the numbers are generated, and I'll certify that the numbers it  
>>>>> produces
>>>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>>>> effort on
>>>>> your part; 6 I don't really understand but the guy who runs the  
>>>>> site is
>>>>> clearly willing to construct a custom feed for cash customers,  
>>>>> if there
>>>>> is enough value in whatever it is you are trying to do to pay for
>>>>> access.  If it's just a lottery, well, lord, I can think of a  
>>>>> dozen ways
>>>>> to make numbers so random that they'd be unimpeachable for any  
>>>>> sort of
>>>>> lottery, both unpredictable and uncorrelated, and they don't  
>>>>> any of them
>>>>> require any significant amount of entropy to get started.
>>>>> I will add one warning -- "randomness" is a rather stringent
>>>>> mathematical criterion, and is generally tested against the null
>>>>> hypothesis.  Amateurs who want to make random number generators  
>>>>> out of
>>>>> supposedly "random" data streams or fancy algorithms almost  
>>>>> invariably
>>>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>>>> really, really good pseudorandom number generators out there  
>>>>> and it is
>>>>> easy to hotwire them together into an xor-based high entropy  
>>>>> stream that
>>>>> basically never repeats (feeding it a bit of real entropy now  
>>>>> and then
>>>>> as it operates).  I would strongly counsel you against trying  
>>>>> to take
>>>>> e.g. weather data and make something "random" out of it.   
>>>>> Unless you
>>>>> really know what you are doing, you will probably make  
>>>>> something that
>>>>> isn't at all random and may not even be unpredictable.  Even most
>>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>>> random", although I doubt it) aren't flat, so that they carry the
>>>>> signature of their generation process unless/until you manage to
>>>>> transform them into something flat (difficult unless you KNOW the
>>>>> distribution they are producing).  Pseudorandom number  
>>>>> generators have
>>>>> the serious advantage of being amenable to at least some  
>>>>> theoretical
>>>>> analysis (so you can "guarantee" flatness out to some high
>>>>> dimensionality, say) as well as empirical testing with e.g.  
>>>>> dieharder.
>>>>> HTH,
>>>>>
>>>>>    rgb
>>>>>> Thanks,
>>>>>> David Mathog
>>>>>> mathog at caltech.edu
>>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>>> Robert G. Brown	                       http://www.phy.duke.edu/ 
>>>>> ~rgb/
>>>>> Duke University Dept. of Physics, Box 90305
>>>>> Durham, N.C. 27708-0305
>>>>> Phone: 1-919-660-2567  Fax: 919-660-2525      
>>>>> email:rgb at phy.duke.edu
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>>>> Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit  
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 08:29:06 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 08:29:06 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108260127470.3126@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
Message-ID: <alpine.LFD.2.02.1108260740380.12275@lilith>

On Fri, 26 Aug 2011, Robert G. Brown wrote:

> Let's try a bit of "paper fiddling".  The expected number of filled slots
> is (this is actual code, not pseudocode, for n slots):
>
>  nlogn = log10(n)*n;
>  expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn));
>
> The reasoning is enormously simple.  The probability of a slot being
> empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 -
> 1/n)^nlogn.  The probability of a slot being filled is thus 1 - p_e, and
> given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within
> random noise, n*(1-1/n)^nlogn of them "should" be empty.

Silly me.  All of the anonymous slots are at least asymptotically
independent (not necessarily obvious, but true from symmetry, I think,
subject to the weak constraint that the total population of all of the
slots has to add up to the number of trials so there are probably n-1
degrees of freedom in the Pearson test).  We have p and q.  The
distribution is binomial and of course I know the binomial distribution
and its sigma.  I can easily build any one of several tests on top of
this (simple binomial or even multinomial, since I effectively have the
hit frequency for n slots and it should BE the binomial distribution),
and in fact have two or three already that are very similar to this on a
smaller scale.

It's what comes of hacking out -- sorry, "fiddling" out -- quick
solutions and tests late at night when you're tired and ought to be
sleeping.  A bit of coffee makes a world of difference...:-)

I'll have to think a bit about it and make sure that this isn't already
done, better, in e.g. STS, but it might yet see the light of day as an
actual dieharder test.

BTW, I'm not replying to your space alien ET post (to the Beowulf list
in reply to an already OT discussion of martingales that arose out of a
discussion of good RNGs and seeding strategies sorry y'all but hey, at
least it is entertaining?) simply because my jaw is sore from hitting
the ground so many times while reading it.  Those are some top-quality
hallucinogens, yes they are...

We will now return to your regularly scheduled discussion of boring
things like bandwidth, memory reliability, parallel algorithms and the
like, you know, on-topic stuff.  But if any of y'all ever need to test
rngs or flame schemes to "win" non-zero-sum games by means of
"strategy", you know who to call...;-) Somewhere upstairs I have this
nifty book on game theory and in a pinch I can even trot out an actual
game matrix and analyze outcomes algefiddlingbraically!

    rgb

P.S. -- Vincent, all of these simple problems were solved by
mathematicians and statisticians so very, very, long ago, beginning with
the work of Pascal and Fermat (there are names to conjure with, eh?)
solving the problem posed by the Chevalier de Mere regarding an even bet
on double sixes happening at least once in 24 throws: actual probability
of double sixes per throw are (of course) 1/36, probability of no double
six in 24 throws are (35/36)^24, odds of at least one are therefore 1 -
(35/36)^24 = 0.4914038761 -- all paper fiddling, mind you -- a result
that is eerily reminiscent of the solution to your problem, but with
fewer slots.  So at even odds it is -- barely -- a sucker's bet.  But a
margin of 0.86% is enough to empty even the deepest pockets, over time.

Now all you have to do is advance your actual knowledge of statistics
beyond that realized by an idly rich French nobleman in 1654 (who still
was wise enough to recognize that it wasn't an even bet and consulted
the best of the best of the minds of his day to prove it).

You have a mere 357 years to go...:-)

P.P.S -- If "all rngs" were really as bad as you assert, does it not
stand to reason that "all Monte Carlo computations" that use them would
all get egregiously incorrect results?  And yet they don't.  In fact, in
problems (like the Ising model in 2D) where known solutions exist, they
agree basically perfectly with the theoretical solution, and of course
it is easy to compare a wide range of integrals and Markov process
outcomes with theory.  So if you used your simple common sense you would
construct a mental argument like:

"Either

I, in my brilliance, have discovered an egregious flaw in all random
number generators used by all of those STUPID computer scientists,
mathematicians, and physicists for decades to do their long and complex
computations that no doubt all got equally egregiously wrong answers;

Or

Those computer scientists, mathematicians, and physicists are actually
pretty smart and aggressively check their work (and each other's work)
with a strong incentive to discover problems.  It is rather probable
that any such egregious error would have been long ago discovered;
therefore there is almost certainly a serious error in my own
reasoning."

Seriously, dude.  Ask yourself "Am I really smarter and better informed
than Pascal, Fermat, Laplace, Bayes, not to mention all of those
contemporary humans who have been devoting entire well-educated careers
to random numbers as if all of modern e-commerce depended on them (it
does) or is it just barely possible that I've made a mistake?"  Come on,
you can do it.  I know it is difficult for you, but try..

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 08:57:55 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 08:57:55 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
	<B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108260829390.12275@lilith>

On Fri, 26 Aug 2011, Vincent Diepeveen wrote:

> EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM.

Bullshit.  "Every programmer" isn't dumb as a post.  Or wasn't my
argument clear enough?  Do you need me to actually post the code for how
the GSL -- written by at least some of these programmers -- do this?
Here, I'll try again.  This time I'll use smaller numbers and make an
actual table of the outcomes:  Imagine only two lousy random bits,
enough to make 00, 01, 10, 11 (or 0,1,2,3).  Here is the probability
table:

r =   0    1    2    3
------------------------
p = 0.25 0.25 0.25 0.25

Let us generate N samples from this distribution.  Our expected
frequency of the occurrence of all of these numbers is:

r =     0      1      2      3
---------------------------------
Np = 0.25*N 0.25*N 0.25*N 0.25*N

Is this all clear?  If I generate 100 random numbers, the expected
number of 3's is 0.25*100 = 25.  Now apply mod 3  the outcomes are now:

r =     0      1      2      3
r%3 =   0      1      2      0
---------------------------------
Np = 0.25*N 0.25*N 0.25*N 0.25*N

You now sum the number of outcomes for each element in the mod 3 table,
since we have two values of r that make one value of r%3 and frequency
clearly aggregates as the outcomes are independent.

r%3 =   0      1      2
---------------------------------
Np = 0.50*N 0.25*N 0.25*N 0.25*N

It is therefore twice as likely that two random bits, modulus 3, will
produce a zero.

>
> Apologies for the caps. I hope how important this is. You're claiming all 
> programmers
> use random numbers in a faulty manner?

They don't.  Only you do.  Everybody else takes a uniform deviate and
scales it by the number of desired integer outcomes, checking to make
sure that they don't go out of bounds and thereby e.g. get an incorrect
endpoint frequency.  The gsl code is open source and it takes two
minutes to download it and check (I just timed it).  Go on, look.  the
file is rng/rng.c in the gsl distro directory, the function name is
gsl_rng_uniform_int.  No modulus.

The exception is (obviously) when the range is a power of 2.  In that
case ONLY, r%n where r is a binary uint and n is a power of 2 will
(obviously) equally balance the table above.  Personally I'd use >> and
shift the bits because it is faster than mod, but suit yourself, after
you've learned what you are doing.

>
> This is important enough to further discuss about it.
>
> As nearly always you need random numbers from within a given domain say 0.. 
> n-1
> So projecting a RNG onto that domain is pretty crucial. How would you want to 
> do that in a correct manner?
>
> In the slot test in fact a simple AND is enough.

No, as I've just proven algebraically.  The correct manner for general n
is the gsl code, but in rough terms it is n*r/r_max (with care used to
avoid roundoff errors at the ends as noted).  If you've been using
modulus, all your results are crap.

Look, the reason God invented the GSL and made it open source is so
numb-nuts and smart people alike wouldn't have to constantly reinvent
the wheel, badly.  Use it.  Don't question it -- you obviously aren't
competent to.  Just use it.  If you want a random integer from 0 to n,
use gsl_rn_uniform_int.  If you want this for e.g. mt19937 don't write
the latter, set up the gsl to use it to generate your ints.  Learn to
use it carefully, use it correctly, but use it.

> It's not interesting to discuss - but yes this strategy makes money in 
> casino's,
> you just get thrown out of the casino and end up at the blacklist if you do.

You are clearly too stupid to be allowed out of the house without a
caretaker.  I'm not going to walk you through the proof that this isn't
so as it is openly published and I've already referenced a step my step
analysis that you can't be bothered, apparently, to actually read.  I'll
just reiterate the previous offer -- I, too, am happy to buy a roulette
wheel and you can come over and bet Martingale against me all day.  Just
one 0, no limits and no quitting, infinite credit on both sides, we play
until it is obvious to you that you are losing, have lost, will always
lose, and the longer you play the more that you will lose.  Loser buys
the winner a case of truly excellent beer.

Look, why don't you fix your random number code and try again, since
your simulations are obviously trash.  It isn't difficult to show this
with simulations, once you actually code them correctly, but I have to
go and don't have time to do it for you.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Fri Aug 26 12:53:14 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 18:53:14 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <alpine.LFD.2.02.1108260829390.12275@lilith>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
	<B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
	<alpine.LFD.2.02.1108260829390.12275@lilith>
Message-ID: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl>


On Aug 26, 2011, at 2:57 PM, Robert G. Brown wrote:

> On Fri, 26 Aug 2011, Vincent Diepeveen wrote:
>
>> EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR  
>> PROGRAM.
>
> Bullshit.  "Every programmer" isn't dumb as a post.  Or wasn't my
> argument clear enough?  Do you need me to actually post the code  
> for how
> the GSL -- written by at least some of these programmers -- do this?
> Here, I'll try again.  This time I'll use smaller numbers and make an
> actual table of the outcomes:  Imagine only two lousy random bits,
> enough to make 00, 01, 10, 11 (or 0,1,2,3).  Here is the probability
> table:
>
> r =   0    1    2    3
> ------------------------
> p = 0.25 0.25 0.25 0.25
>
> Let us generate N samples from this distribution.  Our expected
> frequency of the occurrence of all of these numbers is:
>
> r =     0      1      2      3
> ---------------------------------
> Np = 0.25*N 0.25*N 0.25*N 0.25*N
>
> Is this all clear?  If I generate 100 random numbers, the expected
> number of 3's is 0.25*100 = 25.  Now apply mod 3  the outcomes are  
> now:
>
> r =     0      1      2      3
> r%3 =   0      1      2      0
> ---------------------------------
> Np = 0.25*N 0.25*N 0.25*N 0.25*N
>
> You now sum the number of outcomes for each element in the mod 3  
> table,
> since we have two values of r that make one value of r%3 and frequency
> clearly aggregates as the outcomes are independent.
>
> r%3 =   0      1      2
> ---------------------------------
> Np = 0.50*N 0.25*N 0.25*N 0.25*N
>
> It is therefore twice as likely that two random bits, modulus 3, will
> produce a zero.
>

If you have a domain of 0..3 where a generator generates and your  
modulo n is
just n-1, obviously that means it'll map a tad more to 0.

Basically the deviation one would be able to measure in such case is  
that if
we have a generator that runs over a field of say size m and we want  
to map
that onto n entries then we have the next formula :

m = x * n + y;

Now your theory is basically if i summarize it that in such case the  
entries
0..y-1 will have a tad higher hit than y.. m-1.

However if x is large enough that shouldn't be a big problem.

If we map now in the test i'm doing onto say a few million to a  
billion entries,
the size of that x is a number of 40+ bits for most RNG's.

So that means that the deviation of the effect you show above the  
order of
magnitued of 1 /  2^40 in such case, which is rather small.

Especially because the 'test' if you want to call it like that, is  
operating in the
granularity O ( log n ), we can fully ignore then the expected  
deviation granularity O ( 2 ^ 40 ).


>>
>> Apologies for the caps. I hope how important this is. You're  
>> claiming all programmers
>> use random numbers in a faulty manner?
>
> They don't.  Only you do.  Everybody else takes a uniform deviate and
> scales it by the number of desired integer outcomes, checking to make
> sure that they don't go out of bounds and thereby e.g. get an  
> incorrect
> endpoint frequency.  The gsl code is open source and it takes two
> minutes to download it and check (I just timed it).  Go on, look.  the
> file is rng/rng.c in the gsl distro directory, the function name is
> gsl_rng_uniform_int.  No modulus.
>
> The exception is (obviously) when the range is a power of 2.  In that
> case ONLY, r%n where r is a binary uint and n is a power of 2 will
> (obviously) equally balance the table above.  Personally I'd use >>  
> and
> shift the bits because it is faster than mod, but suit yourself, after
> you've learned what you are doing.
>
>>
>> This is important enough to further discuss about it.
>>
>> As nearly always you need random numbers from within a given  
>> domain say 0.. n-1
>> So projecting a RNG onto that domain is pretty crucial. How would  
>> you want to do that in a correct manner?
>>
>> In the slot test in fact a simple AND is enough.
>
> No, as I've just proven algebraically.  The correct manner for  
> general n
> is the gsl code, but in rough terms it is n*r/r_max (with care used to
> avoid roundoff errors at the ends as noted).  If you've been using
> modulus, all your results are crap.
>
> Look, the reason God invented the GSL and made it open source is so
> numb-nuts and smart people alike wouldn't have to constantly reinvent
> the wheel, badly.  Use it.  Don't question it -- you obviously aren't
> competent to.  Just use it.  If you want a random integer from 0 to n,
> use gsl_rn_uniform_int.  If you want this for e.g. mt19937 don't write
> the latter, set up the gsl to use it to generate your ints.  Learn to
> use it carefully, use it correctly, but use it.
>
>> It's not interesting to discuss - but yes this strategy makes  
>> money in casino's,
>> you just get thrown out of the casino and end up at the blacklist  
>> if you do.
>
> You are clearly too stupid to be allowed out of the house without a
> caretaker.  I'm not going to walk you through the proof that this  
> isn't
> so as it is openly published and I've already referenced a step my  
> step
> analysis that you can't be bothered, apparently, to actually read.   
> I'll
> just reiterate the previous offer -- I, too, am happy to buy a  
> roulette
> wheel and you can come over and bet Martingale against me all day.   
> Just
> one 0, no limits and no quitting, infinite credit on both sides, we  
> play
> until it is obvious to you that you are losing, have lost, will always
> lose, and the longer you play the more that you will lose.  Loser buys
> the winner a case of truly excellent beer.
>
> Look, why don't you fix your random number code and try again, since
> your simulations are obviously trash.  It isn't difficult to show this
> with simulations, once you actually code them correctly, but I have to
> go and don't have time to do it for you.
>
>    rgb
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From diep at xs4all.nl  Fri Aug 26 14:17:46 2011
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Fri, 26 Aug 2011 20:17:46 +0200
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com>
Message-ID: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl>


On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote:

> I hate to troll, but...
>
> On Aug 25, 2011, at 8:27 PM, Vincent Diepeveen <diep at xs4all.nl> wrote:
>
>>
>> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:
>>
>>> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>>>
>>>> I noticed that most generated semi-random numbers with software
>>>> generators,
>>>> had the habit to truely adress a search space of n always in O (n
>>>> log n).
>>>>
>>>> So if you draw from most software RNG's a number and do it  
>>>> modulo n,
>>>> with n being not too tiny, say quite some millions or even
>>>> billions, then every
>>>> slot in your 'hashtable' will get hit at least once by the RNG,
>>>> whereas data
>>>> in reality simply happens to not have that habit simply.
>>>>
>>>> So true random numbers versus generated noise is in this manner  
>>>> easy
>>>> to distinguish by this. Now i didn't study literature whether some
>>>> other chap
>>>> some long time ago already had invented this. That would be most
>>>> interesting
>>>> to know.
>>>
>>> Some other chap named George Marsaglia (and to some extent another
>>> chap
>>> named Donald Knuth) have already invented this.  A number of  
>>> tests of
>>> the tails of random number generators are already in dieharder.  All
>>> "good" modern rngs pass these tests.
>>>
>>> The Martingale betting system you are looking at is even older (at
>>> least
>>> Marsaglia and Knuth are still alive).  It dates back to the 18th
>>> century, and is well known to be flawed for a variety of reasons,  
>>> not
>>> the least of which is that gamblers don't have the infinite wealth
>>> necessary to make this >>even<< a zero-sum strategy and casinos have
>>
>> From mathematical viewpoint it makes perfect cash.
>> As statistica odds is you already have build up considerable profit
>> when a worst case (that you hit the 10 times practical double limit)
>> hits you.
>
> A betting system will not improve the negative mathematical  
> expectation of a  casino game.

the doubling system doesn't have a negative expectation.

You are allowed to double 10 times practical if you start with 1.

Of all systems in roulette this is the only system that will produce  
a profit, just theoretical spoken,
practice we all agree. they kick you out.

>   If your mathematical expectation is -1 for each trial, it's -10  
> for ten trials.  You will not win in the long-run using Martingale.
>

Except that this system doesn't have a negative expectation. it has a  
positive expectation.

There is no other system in roulette that has a positive expectation,  
other than the doubling system.

Please use European Casino model. I don't live in the USA.


>>
>> The simulations are of course using the practical limit.
>>
>> Note that the European casino's have a single zero.
>> In USA there is even more greedy mafia controlling all the casino's,
>> there are 2 zero's there. 0 and 00.
>>
>> The simulations were for European casino's.
>>
>>> betting limits that de facto make it impossible to pursue the
>>> requisite
>>> number of steps and in roulette in particular have 0 and/or 00
>>> slots and
>>> aren't zero-sum to begin with.  You can read a decent analysis of
>>> outcomes based on the presumed binomial distribution of a zero-sum
>>> game
>>> here:
>>>
>>>  http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>>>
>>
>> You're not allowed to use a system in a casino, so we speak about
>> theory. Probably first evening they let you try. Second day you'll
>> get on the blacklist.
>
> Nonsense.  Have you ever been to a casino?


> You are welcome to Martingale all day long at any of them.

> Hell, I'll buy a roulette wheel and you can come over to my place  
> if you play this strategy or any if its variants.

>   The casino wants you to Martingale -- it's favorable to them.   
> Why would they stop a loser?

The doubling system in all casino's if you'd apply to it in an  
objective manner and would be allowed to - it makes a profit.

Same for some slot machines over there. After some others played on  
it and it swallowed money - then majority of slot machines
are not negative sum games anymore. If you play on them then, it's a  
positive sum game.

If it would be always negative sum games then no lady would keep  
playing slot machines.

>
> The casino is not concerned with betting strategies.  It is  
> concerned with folks gaining an edge.  A betting system alone will  
> not give the player an edge.
>

No very wrong, a casino is interested in maximizing its profit.  
Kicking out folks that do well is part of that game.

Oh by the way - I worked for a casino. Did you?


>>
>>> Your test below is interesting, though.  The only real problems I  
>>> can
>>> see with actually using it in dieharder are:
>>>
>>
>> Yeah more interesting than the billion times discussed roulette
>> system which
>> has been analyzed completely flat.
>>
>>>  a) One would need a theoretical estimate of the distribution of
>>> filling given n log n draws on an n-slotted table (for largish n).
>>> That
>>> is, for a perfect rng, what SHOULD the distribution of success/ 
>>> failure
>>> be.
>>
>> As we figured out by now in Artificial Intelligence the statistical
>> assumptions made in the past they simply do not hold.
>>
>> For Artificial Intelligence we need a new sort of theoretical theory.
>>
>> As for the distribution problem, generatiors having a spread that's
>> too accurate,
>> the way to deliver a proof would be for example build a simple  
>> device.
>>
>> Build an old fashioned box where you can draw balls. Remember what
>> you coud
>> see on TV some 20 years ago or so (not sure it was like that in USA).
>>
>> A big basked with balls. The basket, in fact it's looking like this:
>>
>> http://www.rateyours.com/blog/uploaded_images/ 
>> lottery_machine-727064.jpg
>>
>> But now a much bigger machine like this with inside different means
>> of randomizing the balls,
>> actually also randomly modifying the inside  obstacles of shaking of
>> the balls.
>>
>> After a ball has been drawn you automatically have it annotated and
>> the ball immediately goes back
>> into the machine. For a full minute you have the balls in the machine
>> shaken again and you draw
>> again a ball. It is important to do this randomizing of the balls
>> inside the machine for quite some time.
>> I would propose a minute.
>>
>> Of course you have to do this with quite some balls.  Say a thousand.
>>
>> Then you draw balls until all numbers have been drawn at least once.
>>
>> This cool experiment can be easily build. Of course the expected
>> running time of a single experiment
>> will be a few weeks.
>>
>> You can produce a number of those drawing machines though and have a
>> look.
>>
>> Theories that seemingly work for small n, n being the number of  
>> balls,
>> are much harder to maintain at bigger n's, as we also see in prime
>> number research.
>>
>> The way how the machine gets designed of course is total crucial. I
>> would propose a design that
>> really shakes the balls really a lot through each other and really
>> very thoroughly.
>>
>> Just like we nowadays know how flawed a big number of card shaking
>> machines are that are popular to use.
>>
>> Such a lottery with realy a lot of balls would be very interesting to
>> see the outcomes from.
>>
>> In fact i would prefer having produced number of those machines, so
>> that it's possible to really have a lot of outcomes
>> and then analyze them very well.
>>
>>>
>>>  b) One would then need the CDF for this distribution, to be able to
>>> turn the results of N trials (of n log n pulls each) into a p-value
>>> under the null hypothesis -- the probability of obtaining the
>>> particular
>>> number of successes and failures presuming a perfectly random
>>> generator.
>>>
>>> That way dieharder could apply it rigorously to its 70 or 80  
>>> embedded
>>> rngs or to any user's outboard generator.  There probably is
>>> theoretical
>>> statistical support for the PD and/or CDF -- you're analyzing the
>>> tails
>>> of a poissonian process -- but finding it or doing it yourself (or
>>> myself), aye, that's the rub.  One cannot just say "high degree of
>>> certainty that it is an RNG" (by which one means that the rng in
>>> question fails the test for randomness) in the test.  HOW high?
>>> Perfect
>>> rngs or perfectly random processes will sometimes fill your  
>>> table, but
>>> how often?
>>
>> If we assume that reality of life represents randomness, which is
>> another
>> rather good question in how far that theory is plausible, then using
>> that
>> assumption i'm very sure that the RNG's i investigated so far
>> have a distribution which is too perfect, more perfect than i have  
>> seen
>> in any reality.
>>
>> In fact most RNG's fill all slots faster than O ( n log n ), yet it's
>> O ( n log n )
>> that they follow.
>>
>> This is RNG's that have come through all tests as being a good and
>> very acceptabe RNG to be used.
>>
>> Realize i'm no RNG expert, so all the names of all those tests.
>>
>> For me it's just push button technology. I just designed a test
>> and found it very odd that all RNG's have such perfect distributions
>> that they don't even miss a single slot.
>>
>> I'd argue the only test that would be interesting to me to see how it
>> might be in reality is the lottery machine test - yet with really  
>> a lot
>> of balls. I'd prefer 10k balls over a 1000 in fact - yet for  
>> practical
>> reasons i would agree with a number of above a 1000.
>>
>> Paper fiddling is really not interesting to me there to prove  
>> anything,
>> as what i've seen in reality in randomness is total different from  
>> how
>> RNG's model that.
>>
>> Regards,
>> Vincent
>>
>>
>>>  How can you differentiate an "accident" when one does from
>>> an actual failure?  All of those questions require a more rigorous
>>> theory and quantitative result embedded in a test that can be
>>> systematically cranked up to more clearly resolve failures until  
>>> they
>>> are unambiguous, not marginal maybe yes maybe no.
>>>
>>> I suspect that the failures this test would reveal are already more
>>> than
>>> covered in dieharder, in particular by the bit distribution tests  
>>> and
>>> the monkey tests, but I'm not terribly happy with the monkey  
>>> tests and
>>> would be perfectly thrilled to have a simpler to compute test that
>>> revealed precisely this sort of flaw, systematically.  And it  
>>> doesn't
>>> hurt at all to have partially or fully redundant tests as long as  
>>> the
>>> test themselves are rigorously valid.  If you can find or compute  
>>> the
>>> CDF for your test below, I'd be happy to wrap it up and add it to
>>> dieharder, in other words.  One can always SIMULATE a CDF, of  
>>> course,
>>> but that requires a known good generator and sort of begs the  
>>> question
>>> if you don't think that e.g. AES or threefish or KISS are good
>>> generators that would actually pass your test.
>>>
>>> Even hardware/quantum sources of random bits are suspect -- they  
>>> often
>>> are generated by a process that leaves in the traces of an  
>>> underlying
>>> distribution.  I'm not convinced that >>any<< process in the real
>>> world
>>> is >>truly<< random.  Physics is ambiguous on the issue -- the  
>>> quantum
>>> description of a closed system is just as deterministic as the
>>> classical
>>> one, and Master equation unpredictability on open subsets of a large
>>> closed system reflects entropy/ignorance, not actual randomness  
>>> (hence
>>> Einstein's famous "doesn't play dice" remark).  But lots of this are
>>> sufficiently random that one cannot detect any failure of  
>>> randomness,
>>> modern crypto class generators being a prime example.
>>>
>>>   rgb
>>>
>>>>
>>>> In semi pseudo code, let's take an array of size a billion as an
>>>> example,
>>>> though usually a few million is more than ok:
>>>>
>>>> n = 2^30; // 2 to the power 30
>>>>
>>>> Function TestNumbersForRandomness(RNG,n) {
>>>> declare array hashtable[size n];
>>>>
>>>> guessednlogn = 2 * (log n / log 2) * n;
>>>>
>>>> for( i = 0 ; i < n ; i++ )
>>>>  hashtable[i] = FALSE;
>>>>
>>>> ndraws = filledn = 0;
>>>> while( ndraws  < guessednlogn ) {
>>>>   randomnumber = RNG();
>>>>   r = randomnumber % n; //     randomnumber =  r  (mod n)
>>>>   if( hashtable[r] == FALSE ) {
>>>>      hashtable[r] = TRUE;
>>>>      filledn++;
>>>>      if( filledn >= n )
>>>>        break;
>>>>
>>>>  }
>>>>  ndraws++;
>>>> }
>>>>
>>>> if( filledn >= n )
>>>>   print "With high degree of certainty data generated by a RNG\n");
>>>> else
>>>>   print "Not so sure it's a RNG\n";
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Vincent
>>>>
>>>>
>>>>
>>>>
>>>>> -- both unpredictable and
>>>>> flat/decorrelated at all orders, and even though there aren't  
>>>>> really
>>>>> enough of them for my purposes, I've used them as one of the  
>>>>> (small)
>>>>> "gold standard" sources for testing dieharder even as I test
>>>>> them.  For
>>>>> all practical purposes threefish or aes are truly random as  
>>>>> well and
>>>>> they are a lot faster and easier to use as gold standard  
>>>>> generators,
>>>>> though.
>>>>> I don't quite understand why the single site restriction is
>>>>> important --
>>>>> this site has been up for years and I don't expect it to go away
>>>>> soon;
>>>>> it is quite reliable.  I don't think there is anything secret
>>>>> about how
>>>>> the numbers are generated, and I'll certify that the numbers it
>>>>> produces
>>>>> don't make dieharder unhappy.  So 1 is fixable with a bit of
>>>>> effort on
>>>>> your part; 6 I don't really understand but the guy who runs the
>>>>> site is
>>>>> clearly willing to construct a custom feed for cash customers, if
>>>>> there
>>>>> is enough value in whatever it is you are trying to do to pay for
>>>>> access.  If it's just a lottery, well, lord, I can think of a
>>>>> dozen ways
>>>>> to make numbers so random that they'd be unimpeachable for any
>>>>> sort of
>>>>> lottery, both unpredictable and uncorrelated, and they don't any
>>>>> of them
>>>>> require any significant amount of entropy to get started.
>>>>> I will add one warning -- "randomness" is a rather stringent
>>>>> mathematical criterion, and is generally tested against the null
>>>>> hypothesis.  Amateurs who want to make random number generators
>>>>> out of
>>>>> supposedly "random" data streams or fancy algorithms almost
>>>>> invariably
>>>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>>>> really, really good pseudorandom number generators out there and
>>>>> it is
>>>>> easy to hotwire them together into an xor-based high entropy
>>>>> stream that
>>>>> basically never repeats (feeding it a bit of real entropy now and
>>>>> then
>>>>> as it operates).  I would strongly counsel you against trying to
>>>>> take
>>>>> e.g. weather data and make something "random" out of it.   
>>>>> Unless you
>>>>> really know what you are doing, you will probably make something
>>>>> that
>>>>> isn't at all random and may not even be unpredictable.  Even most
>>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>>> random", although I doubt it) aren't flat, so that they carry the
>>>>> signature of their generation process unless/until you manage to
>>>>> transform them into something flat (difficult unless you KNOW the
>>>>> distribution they are producing).  Pseudorandom number generators
>>>>> have
>>>>> the serious advantage of being amenable to at least some  
>>>>> theoretical
>>>>> analysis (so you can "guarantee" flatness out to some high
>>>>> dimensionality, say) as well as empirical testing with e.g.
>>>>> dieharder.
>>>>> HTH,
>>>>>
>>>>>    rgb
>>>>>> Thanks,
>>>>>> David Mathog
>>>>>> mathog at caltech.edu
>>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>>> Robert G. Brown                           http:// 
>>>>> www.phy.duke.edu/~rgb/
>>>>> Duke University Dept. of Physics, Box 90305
>>>>> Durham, N.C. 27708-0305
>>>>> Phone: 1-919-660-2567  Fax: 919-660-2525      
>>>>> email:rgb at phy.duke.edu
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>>> Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> Robert G. Brown                           http://www.phy.duke.edu/ 
>>> ~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>>
>>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Fri Aug 26 17:46:30 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 26 Aug 2011 17:46:30 -0400 (EDT)
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl>
References: <E1QrtYH-0004ix-Ox@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108121416030.3145@lilith>
	<3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl>
	<alpine.LFD.2.02.1108250743280.3079@lilith>
	<9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl>
	<alpine.LFD.2.02.1108260127470.3126@lilith>
	<B21C9F05-1296-4594-A5D1-7C7F11630571@xs4all.nl>
	<alpine.LFD.2.02.1108260829390.12275@lilith>
	<698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl>
Message-ID: <alpine.LFD.2.02.1108261335130.12325@lilithnew>

On Fri, 26 Aug 2011, Vincent Diepeveen wrote:

> If you have a domain of 0..3 where a generator generates and your modulo n is
> just n-1, obviously that means it'll map a tad more to 0.
>
> Basically the deviation one would be able to measure in such case is that if
> we have a generator that runs over a field of say size m and we want to map
> that onto n entries then we have the next formula :
>
> m = x * n + y;
>
> Now your theory is basically if i summarize it that in such case the entries
> 0..y-1 will have a tad higher hit than y.. m-1.

What's a "tad" when you're measuring the quality of an RNG?  I'm just
curious.  Could you be more specific?  Just what are the limits,
specifically, if your random number is a 30 bit uint that makes numbers
in the range of 0-1 billion (nearly all generators make uints, with a few
exceptions that usually make less than 32 bits -- 64 bit generators are
still a rarity although of course two 32 rands makes one 64 bit rand)
and you mod with m, especially a nice large m that doesn't integer
divide r like 1.5 million?  That means that each integer in the range 0
to 1.5 million gets 666 repetitions as the entire range of r gets
sampled, except the first million that get 667.  That means that the
odds of pulling a number from 1 to a million are 1e6*667/1.e9 = .667.
The odds of pulling a number from the second million are 0.5*666/1.e9 =
.333 = 1 - .667.  An old lady with terrible eyes could detect such an
imbalance in probability from across the street -- you wouldn't even
need a "random number generator tester".

Weren't you advocating using this for nice large m like a million?  I
think that you were.  No, wait!  You were advocating something like one
BILLION, right?  Wrong direction to make it better, dude, this makes it
worse.

Note that this scales pretty well.  For m in the range of thousands, the
imbalance will be something like 0.666667 and 0.333330 -- still pretty
easy to detect with any halfway decent RNG tester.  Basically, you don't
get (immeasurably) close to a uniform distribution in the weighting of
any integer until you get down to (unsurprisingly) m of order unity
compared to a uint, which at which point it basically becomes as
accurate as m*(r/r_max) was in the first place.

Note also that you've created an imbalance in the weighting of the
integers you are sampling that is far, far greater and more serious than
any other failure of randomness that your RNG might have.  So much so
that you couldn't possibly see any other -- it would be a signal swamped
in the noise of your error, even for m merely in the thousands -- one
part per million errors in randomness are easy to detect in simulations
that draw 10^9 or so random numbers (which is still a SMALL TEST
simulation -- real simulations draw 10^16 or 10^18 and your error would
put answers on another PLANET compared to where they should be.

Most coders probably can actually work this out with a pencil, and so I
repeat no, nobody competent uses a modulus to generate integers in a
fixed range in circumstances where the quality of the result matters,
e.g. numerical simulation or cryptography as opposed to gaming, unless
the modulus is a power of two.

> However if x is large enough that shouldn't be a big problem.
>
> If we map now in the test i'm doing onto say a few million to a billion 
> entries,
> the size of that x is a number of 40+ bits for most RNG's.

x=32, a uint for most RNGs.  Or to put it another way, RNGs generate a
bit stream, which they might do with an algorithm that generates
30,31,32, or more bits at a time, but the prevalence of 32 bit
architectures and the fact that it is trivial to concatenate 32s to get
64+ bits when desired has slowed the development of true 64 bit RNGs.
Eventually there will be some, of course, and it will STILL be a mistake
to use a modulus to create random integers in some general range.  A bad
algorithm is a bad algorithm, and this makes sense only if speed is more
important than randomness (in which case one has to wonder, why use a 64
bit RNG in the first place, why use a good RNG in the first place).

> So that means that the deviation of the effect you show above the order of
> magnitued of 1 /  2^40 in such case, which is rather small.

Except that it isn't, as I showed in a fair bit of detail.  It might be
if x were as large as you claim, which it isn't (in general or even
"commonly") and if one confined m to be order unity.  For m of order
2^20 (a million) the error for 2^40 is order 2^20 (a millionth) which
shows up even in single precision floating point.  Why bother testing
such a stream for randomness?  It fails.  You've made it fail.  It fails
spectacularly if the generator is perfect, if the goddess Ifni herself
produces the string of digits.  It cannot succeed.

> Especially because the 'test' if you want to call it like that, is operating 
> in the
> granularity O ( log n ), we can fully ignore then the expected deviation 
> granularity O ( 2 ^ 40 ).

Well, except that basically 100% of the rngs in the GSL pass your "test"
when it is written correctly.  They also produce precisely the
correct/expected result (within easily understandable and expected
random scatter) on top of that if they are "good" rngs.  So the "test"
isn't much of an actual test, and your assertion that "all rngs fail it"
is false and based on a methodology that introduces many orders of
magnitude of error greater than the generators are known to have as
upper bounds.

Given this fact, which I have personally verified, do you imagine that
there might be other errors in your actual (not your pseudo) code?  You
gotta wonder.  If you've tested a Mersenne Twister with your "test" and
it fails to pass, either an MT is crap and all of the theoretical papers
and experienced testers who have tested with sophisticated and
peer-reviewed tools are stupid poo-poo heads, or, well, could it be that
your test or implementation of the MT is crap and the MT itself in
general is what everyone else seems to think that it is based on
extensive "paper fiddling" and enormous piles of empirical testing
evidence written by actual statisticians and rng experts.  Which is to
say, a damn good pseudo-RNG decorrelated in some 600+ dimensions that
passes nearly all known tests with flying colors.

Hmmm, let's put on our Bayesian thinking caps, consider the priors, and
try very hard to guess which one is much much more likely on Jaynes'
"decibel" scale of probabilities.  Would you say that it is 20 decibels
more likely that the MT is good and the test is broken?  50?  200?  I
like 2000 or thereabouts myself, or as we in the business might say, "it
is a fact" that your test is broken since 10^200 is a really big number,
comparatively speaking.

Now, it would be nice if you apologized to "all RNGs" and "all
programmers" and the various other groups you indicted on your little
fallacious rant, but I'll consider myself enormously fortunate at this
point if you simply acknowledge that maybe, just maybe, your original
pronouncement -- that all rngs produce an egregiously, trivially
verifiable excessive degree of first order sequential uniformity, is
categorically and undeniably false.

Of course, if you think I'm lying just to make you look bad, I can post
a modified version of dieharder with your test embedded so absolutely
anybody can see for themselves that all of the embedded generators pass
your test and that not one single thing you asserted in grandiosely
producing it was correct.  The code is quite short and anybody can
understand it.

Or you can take my moderately expert word for it -- the results I posted
are honest results produced using real RNGs from a real numerical
library in the real test written by block copying your pseudocode,
converting/realizing it in C, and fixing your obvious error in the
generation of random ints in the range 0-m by using a tested algorithm
written by people who actually know what they are doing that is IN the
aforementioned real numerical library.

Seriously, it is done.  Finished.  You're wrong.  Say "I'm sorry, Mr.
Mersenne Twister, if my test passes randu then how could it possibly
fail you?"  And don't forget to apologize to AES, RSA, DES, and all of
the other encryption schema too.  They all feel real bad that you called
them stupid poo-poo heads unable to pass the simplest first order
frequency test one can imagine, since they all had to pass MUCH more
rigorous and often government mandated testing to ever get adopted as
the basis for encryption.

I don't expect an apology to me for being indicted along with ALL the
OTHER programmers in the world for being stupid enough to use mod to
make a supposedly uniformly distributed range of m rands.  Not even
Numerical Recipes was that boneheaded. But its OK, we all know that we
didn't really ever do that, and if you did (and continue to do,
apparently, learning nothing from my patient and thorough exposition of
how it produces errors that are vastly greater than the ones that you
think you are detecting) that's a problem to who?  That's right, mister.
To you.  You'll just keep getting wroooooong answers, and then
announcing them as fact and making yourself look silly.  Or even
sillier, if that is possible.

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From atp at piskorski.com  Sat Aug 27 10:26:23 2011
From: atp at piskorski.com (Andrew Piskorski)
Date: Sat, 27 Aug 2011 10:26:23 -0400
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl>
References: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl>
Message-ID: <20110827142623.GA29931@piskorski.com>

On Fri, Aug 26, 2011 at 08:17:46PM +0200, Vincent Diepeveen wrote:
> On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote:

>> A betting system will not improve the negative mathematical  
>> expectation of a casino game.

Right.

> Except that this system doesn't have a negative expectation. it has a  
> positive expectation.
> 
> There is no other system in roulette that has a positive expectation,  
> other than the doubling system.

Vincent, are you shitting us?  Or am I misremembering the tortured
history of this thread, and by "doubling system" you do NOT mean the
trivial martingale betting system that's been used (disastrously) and
analyzed for over 200 years?

Actually it doesn't matter; as Shawn Hood pointed out above, your
assertion is still wrong even if you actually meant some other
non-martingale betting system.  You insisting that *martingale*
betting gives you a positive expectation at roulette just makes it
much funnier!

There are ways to gain positive expectation in roulette (other than
the obvious fraud and collusion).  They involve finding a poorly
installed roulette table and using a wearable computer and physics to
predict where the ball will land.  Look up Thorp and Shannon's
research on the subject; they actually used it in casinos c. 1961.

None of those ways are due to some special method of betting.  The
point of betting systems is to optimize your small edge, but you have
to HAVE that edge in the first place.  Money management is important
because tells you how to properly size your risk, but it can't give
you alpha.

Now yes, if you have a very volatile "roulette" game and a 0% edge (no
advantage to either you or the house), with some luck you could get
rich by playing it for a limited period of time and quitting while
you're ahead.  But you still have a 0% expectation game; look up the
mathematical definition of "expectation".

Also, I don't remember for sure, but I believe martingale betting is
(always) more aggressive than Kelly.  If so, then it is inherently
stupid.  Kelly defines the MAXIMUM size bet that it is rational to
make, assuming your goal is maximum compounded wealth AND you have a
quantifiable edge (however small) in the game.  It can make sense to
bet less than Kelly, and if you believe you have no edge the rational
bet is zero.  It is never rational to bet more than Kelly.

In practice, even when you are sure you have a real edge, you want to
bet less than Kelly, often much less.  There are several reasons for
that; one is that calculating Kelly depends on your estimate of how
big your edge is, and it is easy to overestimate your edge such that
in truth you are massively overbetting (taking way too much risk) at
2x Kelly or even more.

But optimizing the way you bet doesn't turn an inherently losing game
into a winner.  If the edge is with the house - as it certainly is
with a fair roulette table - the rational bet is not to make one.

This news article is probably more interesting:

  http://www.theonion.com/articles/casino-has-great-night,1506/
  Casino Has Great Night; May 28, 2003

-- 
Andrew Piskorski <atp at piskorski.com>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Sat Aug 27 11:27:37 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Sat, 27 Aug 2011 08:27:37 -0700
Subject: [Beowulf] OT: public random numbers?
In-Reply-To: <20110827142623.GA29931@piskorski.com>
Message-ID: <CA7E589E.A5F0%james.p.lux@jpl.nasa.gov>

>
>
>
>There are ways to gain positive expectation in roulette (other than
>the obvious fraud and collusion).  They involve finding a poorly
>installed roulette table and using a wearable computer and physics to
>predict where the ball will land.  Look up Thorp and Shannon's
>research on the subject; they actually used it in casinos c. 1961.

I think Shannon and Thorpe just analyzed it, without actually using it.

See "The Eudaemonic Pie" about some physics guys at UC Santa Cruz who
built wearable hardware. Early 70s, I should think, based on my
recollections of the kind of ICs they were using.  (I also note, based on
the book, that while they were good at the physics, they weren't very good
at electronics design and construction)


They never made the system work very well (concept sound, execution not so
hot)..but it did encourage the gaming industry to get new laws prohibiting
the use of assistive devices.  Just you and the casino, mano a mano (or,
more accurately cerebro a leyes de la probabilidad)

>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Wed Aug 31 13:29:18 2011
From: mathog at caltech.edu (David Mathog)
Date: Wed, 31 Aug 2011 10:29:18 -0700
Subject: [Beowulf] materials for air shroud?
Message-ID: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>

Anybody know of a nice cheap, high melting point, easy to work with
sheet material, for making a custom air shroud?  

We have one box with stuff in it that looks similar to HDPE, the
material the white flexible cutting boards are made of, but it is a bit
thinner and more rigid that that.  Unfortunately there are no markings
on it, so HDPE is just a guess.  Whatever it is, it cut easily with
scissors (I had to trim it slightly at one point.)

Background.  We have an older Supermicro SC-823 server with dual
processors.  The air shroud it came with only covers the first
processor.  That didn't matter much when it had two low power processors
in it, but after upgrading it to dual Opteron 280s, the uncovered second
one runs considerably hotter than the covered front one.  (Swapping the
processors around didn't help - the heat stayed where it was, so a
ventilation issue, not a processor issue.)  Supermicro does make a newer
shroud which extends to the back of the case, but the manual (google for
"SC-823 air shroud user's guide") indicates that it is designed for
Intel CPUs.  So it may or may not fit around the Opterons.

The redesigned air shroud will probably work, but I'm about 90%
confident that taping a sheet of plastic onto the back of the existing
shroud would work as well - if I can find a plastic that won't flap
around or melt.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From deadline at eadline.org  Wed Aug 31 14:20:03 2011
From: deadline at eadline.org (Douglas Eadline)
Date: Wed, 31 Aug 2011 14:20:03 -0400 (EDT)
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
References: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
Message-ID: <44280.192.168.93.213.1314814803.squirrel@mail.eadline.org>

David,

I have experimented with some simple ducting for my Limulus
system. I found a Vinyl Flashing from Union Corrugating Company
(purchased at Lowes home center) that has some nice features, it is
bendable, holds its shape, easy to cut, and has a low carbon content
(harder to burn than most plastics), and it is fairly stiff.

My needs are "low temp" air ducting. I have not tested it
with constant warm/hot air.

--
Doug


> Anybody know of a nice cheap, high melting point, easy to work with
> sheet material, for making a custom air shroud?
>
> We have one box with stuff in it that looks similar to HDPE, the
> material the white flexible cutting boards are made of, but it is a bit
> thinner and more rigid that that.  Unfortunately there are no markings
> on it, so HDPE is just a guess.  Whatever it is, it cut easily with
> scissors (I had to trim it slightly at one point.)
>
> Background.  We have an older Supermicro SC-823 server with dual
> processors.  The air shroud it came with only covers the first
> processor.  That didn't matter much when it had two low power processors
> in it, but after upgrading it to dual Opteron 280s, the uncovered second
> one runs considerably hotter than the covered front one.  (Swapping the
> processors around didn't help - the heat stayed where it was, so a
> ventilation issue, not a processor issue.)  Supermicro does make a newer
> shroud which extends to the back of the case, but the manual (google for
> "SC-823 air shroud user's guide") indicates that it is designed for
> Intel CPUs.  So it may or may not fit around the Opterons.
>
> The redesigned air shroud will probably work, but I'm about 90%
> confident that taping a sheet of plastic onto the back of the existing
> shroud would work as well - if I can find a plastic that won't flap
> around or melt.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


--
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Wed Aug 31 14:43:39 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Wed, 31 Aug 2011 11:43:39 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
References: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F01085116C6C2@ALTPHYEMBEVSP20.RES.AD.JPL>

Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..)  It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too.

On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. )

Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar.  Polystyrene and acrylic tend to be more brittle.  ABS is very nice to work with.  PVC is also easy to work with. Nylon is another possibility.

Do you want to be able to glue it?

What I would do is call up profesionalplastics.com  formerly Cadillac Plastics (many outlets nationwide) and see what they have.  It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go.  Plastic Depot in Burbank has a huge selection.

Drive over there, and ask the counter folks what would work for you.  $10-20 will get you more plastic than you know what to do with.

Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications.


Jim Lux
+1(818)354-2075 

> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog
> Sent: Wednesday, August 31, 2011 10:29 AM
> To: beowulf at beowulf.org
> Subject: [Beowulf] materials for air shroud?
> 
> Anybody know of a nice cheap, high melting point, easy to work with
> sheet material, for making a custom air shroud?
> 
> We have one box with stuff in it that looks similar to HDPE, the
> material the white flexible cutting boards are made of, but it is a bit
> thinner and more rigid that that.  Unfortunately there are no markings
> on it, so HDPE is just a guess.  Whatever it is, it cut easily with
> scissors (I had to trim it slightly at one point.)
> 
> Background.  We have an older Supermicro SC-823 server with dual
> processors.  The air shroud it came with only covers the first
> processor.  That didn't matter much when it had two low power processors
> in it, but after upgrading it to dual Opteron 280s, the uncovered second
> one runs considerably hotter than the covered front one.  (Swapping the
> processors around didn't help - the heat stayed where it was, so a
> ventilation issue, not a processor issue.)  Supermicro does make a newer
> shroud which extends to the back of the case, but the manual (google for
> "SC-823 air shroud user's guide") indicates that it is designed for
> Intel CPUs.  So it may or may not fit around the Opterons.
> 
> The redesigned air shroud will probably work, but I'm about 90%
> confident that taping a sheet of plastic onto the back of the existing
> shroud would work as well - if I can find a plastic that won't flap
> around or melt.
> 
> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Wed Aug 31 15:15:22 2011
From: mathog at caltech.edu (David Mathog)
Date: Wed, 31 Aug 2011 12:15:22 -0700
Subject: [Beowulf] materials for air shroud?
Message-ID: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>

> Cardboard? Card stock? Masking tape? White glue? (that's what I
> usually use for cooling ducts.. easy to cut, glue, tape..)  It's no
> more flammable than plastic, and it doesn't melt and get soft. 

That never crossed my mind.

You sure about the flammability?  I believe it for the ignition due to
temperature (Fahrenheit 451 and all that).  However, I have a gut
feeling (but no data) that sparks are fairly likely to ignite cardboard,
and less likely to ignite a solid plastic sheet (polyethylene or
polypropylene, for instance).  Not that I'm expecting sparks, but that
is a real possibility when a power supply fails.  Maybe even a brief
flame.  Of course paper won't hold up well compared to plastic if it
gets wet.  Moisture resistance is not important here though - if the
insides of the computer are dripping, air shroud failure is the least of
my worries.  

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Wed Aug 31 15:18:36 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Wed, 31 Aug 2011 12:18:36 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
References: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F01085116C6D6@ALTPHYEMBEVSP20.RES.AD.JPL>

Paper doesn't catch fire at 451F.. it does start to turn brown. (Sorry Ray..)
(I cook bacon on a rack over paper in a 450 degree oven.. and I doubt the temperature control is that tight)

Flammability is an issue.. paper is rougher than most plastics, so a spark can lodge or a small fiber could catch.  You could fireproof the paper pretty easily with a variety of treatments.


Jim Lux
+1(818)354-2075 

> -----Original Message-----
> From: David Mathog [mailto:mathog at caltech.edu]
> Sent: Wednesday, August 31, 2011 12:15 PM
> To: Lux, Jim (337C); beowulf at beowulf.org
> Subject: RE: [Beowulf] materials for air shroud?
> 
> > Cardboard? Card stock? Masking tape? White glue? (that's what I
> > usually use for cooling ducts.. easy to cut, glue, tape..)  It's no
> > more flammable than plastic, and it doesn't melt and get soft.
> 
> That never crossed my mind.
> 
> You sure about the flammability?  I believe it for the ignition due to
> temperature (Fahrenheit 451 and all that).  However, I have a gut
> feeling (but no data) that sparks are fairly likely to ignite cardboard,
> and less likely to ignite a solid plastic sheet (polyethylene or
> polypropylene, for instance).  Not that I'm expecting sparks, but that
> is a real possibility when a power supply fails.  Maybe even a brief
> flame.  Of course paper won't hold up well compared to plastic if it
> gets wet.  Moisture resistance is not important here though - if the
> insides of the computer are dripping, air shroud failure is the least of
> my worries.
> 
> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bill at cse.ucdavis.edu  Wed Aug 31 17:04:44 2011
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Wed, 31 Aug 2011 14:04:44 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
References: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>
Message-ID: <4E5EA1EC.7080804@cse.ucdavis.edu>

On 08/31/2011 12:15 PM, David Mathog wrote:
> That never crossed my mind.
> 
> You sure about the flammability?  I believe it for the ignition due to
> temperature (Fahrenheit 451 and all that).  However, I have a gut
> feeling (but no data) that sparks are fairly likely to ignite cardboard,
> and less likely to ignite a solid plastic sheet (polyethylene or
> polypropylene, for instance).  Not that I'm expecting sparks, but that
> is a real possibility when a power supply fails.  Maybe even a brief
> flame.  Of course paper won't hold up well compared to plastic if it
> gets wet.  Moisture resistance is not important here though - if the
> insides of the computer are dripping, air shroud failure is the least of
> my worries.  

I'm aware of a machine room fire that was attributed to cardboard dust
and the storage of flammable material (paper and cardboard).

I wouldn't recommend cardboard or anything else that might generate
flammable dust in a high 50-90C airflow environment with low humidity.

Supermicro does seem to play pretty fast and loose with a shroud and
cooling in general.  We had nodes bouncing off the thermal max (and
throttling) despite air intake temperatures 30F below the specifications
while having very low power load in the node (read that as no expansion
cards, one low rpm disk, and the lowest clocked CPU).

We did however get them to ship us free shrouds once we complained.

Is it really worth wasting even an hour to not get the real shroud?  Not
sure if this is the one, but they aren't particularly expensive ($13):

http://www.provantage.com/supermicro-mcp-310-18003-0n~7SUP91KW.htm
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Aug 31 17:05:34 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 31 Aug 2011 17:05:34 -0400 (EDT)
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <ECE7A93BD093E1439C20020FBE87C47F01085116C6C2@ALTPHYEMBEVSP20.RES.AD.JPL>
References: <E1QyobG-0001Jq-1T@mendel.bio.caltech.edu>
	<ECE7A93BD093E1439C20020FBE87C47F01085116C6C2@ALTPHYEMBEVSP20.RES.AD.JPL>
Message-ID: <alpine.LFD.2.02.1108311700460.6118@lilith>

On Wed, 31 Aug 2011, Lux, Jim (337C) wrote:

Also thin aluminum.  You can get aluminum sheeting that you can cut with
scissors and that is easy to bend into shapes if you have a bending jig
(or can make one with two pieces of board stock and a vise).  Cheap,
fireproof, meltproof at any temperatures you're likely to reach, no
toxic fumes in a fire, can be glued or screwed.  The one drawback is
that it is a PITA to weld or solder if that's important to you, but for
an air shroud you can probably make compression joints (interlocking U
rims, squeezed down) that are adequate.

Most hardware stores (roof flashing), some auto parts or hobby stores.
Copper too, but more expensive.  Don't know about thin "enough" sheet
steel, but probably -- copper or steel would both weld or solder easily.

    rgb

> Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..)  It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too.
>
> On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. )
>
> Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar.  Polystyrene and acrylic tend to be more brittle.  ABS is very nice to work with.  PVC is also easy to work with. Nylon is another possibility.
>
> Do you want to be able to glue it?
>
> What I would do is call up profesionalplastics.com  formerly Cadillac Plastics (many outlets nationwide) and see what they have.  It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go.  Plastic Depot in Burbank has a huge selection.
>
> Drive over there, and ask the counter folks what would work for you.  $10-20 will get you more plastic than you know what to do with.
>
> Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications.
>
>
> Jim Lux
> +1(818)354-2075
>
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog
>> Sent: Wednesday, August 31, 2011 10:29 AM
>> To: beowulf at beowulf.org
>> Subject: [Beowulf] materials for air shroud?
>>
>> Anybody know of a nice cheap, high melting point, easy to work with
>> sheet material, for making a custom air shroud?
>>
>> We have one box with stuff in it that looks similar to HDPE, the
>> material the white flexible cutting boards are made of, but it is a bit
>> thinner and more rigid that that.  Unfortunately there are no markings
>> on it, so HDPE is just a guess.  Whatever it is, it cut easily with
>> scissors (I had to trim it slightly at one point.)
>>
>> Background.  We have an older Supermicro SC-823 server with dual
>> processors.  The air shroud it came with only covers the first
>> processor.  That didn't matter much when it had two low power processors
>> in it, but after upgrading it to dual Opteron 280s, the uncovered second
>> one runs considerably hotter than the covered front one.  (Swapping the
>> processors around didn't help - the heat stayed where it was, so a
>> ventilation issue, not a processor issue.)  Supermicro does make a newer
>> shroud which extends to the back of the case, but the manual (google for
>> "SC-823 air shroud user's guide") indicates that it is designed for
>> Intel CPUs.  So it may or may not fit around the Opterons.
>>
>> The redesigned air shroud will probably work, but I'm about 90%
>> confident that taping a sheet of plastic onto the back of the existing
>> shroud would work as well - if I can find a plastic that won't flap
>> around or melt.
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From mathog at caltech.edu  Wed Aug 31 17:24:48 2011
From: mathog at caltech.edu (David Mathog)
Date: Wed, 31 Aug 2011 14:24:48 -0700
Subject: [Beowulf] materials for air shroud?
Message-ID: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>

 Robert G. Brown wrote

> Also thin aluminum. 

No way, at least not anywhere near the motherboard.  There isn't going
to be a way to fasten it very tightly into position, just tape probably,
possibly a zip tie at the back end.  So it would be best if the shroud
cannot short things out or scratch components off the motherboard if it
falls out of position.

I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for
this, and it is similar to the shroud material we have in another server.

Regards,  

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From Glen.Beane at jax.org  Wed Aug 31 17:42:23 2011
From: Glen.Beane at jax.org (Glen Beane)
Date: Wed, 31 Aug 2011 21:42:23 +0000
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <4E5EA1EC.7080804@cse.ucdavis.edu>
References: <E1QyqFu-0001LU-KZ@mendel.bio.caltech.edu>,
	<4E5EA1EC.7080804@cse.ucdavis.edu>
Message-ID: <EB8AA368-5EAD-47CF-85A7-5F8B680063FC@jax.org>


On Aug 31, 2011, at 5:05 PM, "Bill Broadley" <bill at cse.ucdavis.edu> wrote:

> On 08/31/2011 12:15 PM, David Mathog wrote:
>> That never crossed my mind.
>> 
>> You sure about the flammability?  I believe it for the ignition due to
>> temperature (Fahrenheit 451 and all that).  However, I have a gut
>> feeling (but no data) that sparks are fairly likely to ignite cardboard,
>> and less likely to ignite a solid plastic sheet (polyethylene or
>> polypropylene, for instance).  Not that I'm expecting sparks, but that
>> is a real possibility when a power supply fails.  Maybe even a brief
>> flame.  Of course paper won't hold up well compared to plastic if it
>> gets wet.  Moisture resistance is not important here though - if the
>> insides of the computer are dripping, air shroud failure is the least of
>> my worries.  
> 
> I'm aware of a machine room fire that was attributed to cardboard dust
> and the storage of flammable material (paper and cardboard).
> 


I've seen servers shipped with paperboard shrouds directing air over the processors...

I won't mention the vendor by name
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From rgb at phy.duke.edu  Wed Aug 31 17:44:45 2011
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 31 Aug 2011 17:44:45 -0400 (EDT)
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>
References: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>
Message-ID: <alpine.LFD.2.02.1108311739250.6118@lilith>

On Wed, 31 Aug 2011, David Mathog wrote:

> Robert G. Brown wrote
>
>> Also thin aluminum.
>
> No way, at least not anywhere near the motherboard.  There isn't going
> to be a way to fasten it very tightly into position, just tape probably,
> possibly a zip tie at the back end.  So it would be best if the shroud
> cannot short things out or scratch components off the motherboard if it
> falls out of position.

Don't forget the virtue of coat hangers.  Even rubber coated ones.

If you made the shroud out of aluminum, you could basically paint the
bottom with liquid electrical tape (or better, dip it four or five
times, drying it in between).  It would basically rubber-coat it.  No
shorting, no scratching, still moderately fireproof.  But as you wish.

> I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for
> this, and it is similar to the shroud material we have in another server.

The biggest problem with stuff like this (IIRC a discussion from long
ago) is you have to worry about what and how toxic it is in a fire, at
least if you want fire-persons to be able to enter the room in a fire.
Many plastics burn into really toxic materials.  You also have to worry
about how it will cope with high heat.  The good thing about aluminum is
that by the time it melts you won't care.  I think some of the liquid
tape compounds are fire retardant/melt resistant, and the aluminum
itself is such a good conductor of heat that it will act as a heat sink
for the rubber coating (in a good way).

    rgb

>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From james.p.lux at jpl.nasa.gov  Wed Aug 31 17:56:08 2011
From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C))
Date: Wed, 31 Aug 2011 14:56:08 -0700
Subject: [Beowulf] materials for air shroud?
In-Reply-To: <alpine.LFD.2.02.1108311739250.6118@lilith>
References: <E1QysHA-0001Ni-Vt@mendel.bio.caltech.edu>
	<alpine.LFD.2.02.1108311739250.6118@lilith>
Message-ID: <ECE7A93BD093E1439C20020FBE87C47F01085116C754@ALTPHYEMBEVSP20.RES.AD.JPL>

Plastic tape covering the aluminum.. 20 mil "pipe wrap" is useful stuff.  3M VHB double stick foam tape to hold it in place.

But, enough of this feeble lash-up idea:  I think the real solution is to have a second cluster doing a complete finite element model of the instantaneous temperature distribution within the processor in question, driving a set of actuators to form a dynamically optimized shroud.  Or, perhaps the shroud could be made from millimachines implementing very simple control logic, but with an appropriate emergent behavior based on, say, their temperature sensing capability.  The millimachines should, of course, be self replicating.  Perhaps a suitably genetically engineered extremophile could be created?  

A second cluster does the model, a third cluster determines the optimum genetic sequence, a fourth cluster is responsible for iteratively doing the bioengineering to create the organisms, etc.  (or for a less biologically inspired system, the third and fourth clusters are doing some form of adaptive evolving micro manufacturing)

I'd provide more details, but really, that's just engineering, and is obvious to a skilled practitioner. 

 (for those not at CalTech (who is my employer, as well as David's), you can contact their patent counsel for rights to the invention disclosed above, which I'm sure they'll be happy to license to you and reasonable and non-discriminatory terms.<grin>)

> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> If you made the shroud out of aluminum, you could basically paint the
> bottom with liquid electrical tape (or better, dip it four or five
> times, drying it in between).  It would basically rubber-coat it.  No
> shorting, no scratching, still moderately fireproof.  But as you wish.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.