From reuti at staff.uni-marburg.de Mon Aug 1 13:38:30 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 1 Aug 2011 19:38:30 +0200 Subject: [Beowulf] Fwd: H8DMR-82 ECC error References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk> Message-ID: Hi all, on behalf of J?rg I forward this to the list, as his account seems to be blocked to post to this list any longer. -- Reuti > ############# > Dear all, > > as I cannot post directly to the list although I am subscribing to it, I have > asked a friend of mine to post that for me. > I am currently having severe problems with one of the clusters I am > maintaining. Around 50% of these nodes are crashing when we are running cp2k > on it. Although they are IB nodes, even without the IB card installed the test > jobs crash the node as well. So I can rule out an IB related problem. Memtest > was ok, I done 9 cycles without any problems. Unfortunately I cannot swap the > memory as I don't have any of them at all and hence I have to rely on Memtest > here. The nodes which are causing the problems show other symptoms as well: I > had problem with 3 of them to boot again after a normal shutdown procedure > (the fans come on, and die after a short period and I don't even get to the > POST stage at all). So they are offline as well. Two of the remaining nodes were > exceedingly hot after a reboot. When I took them out the fans were spinning > and now they appear to be ok. These are AMD Opteron 2220 dual core processors > with 2 CPUs per node. The mother board is a H8DMR-82 with the BIOS version > 080014 (release date 07/13/2007). It appears that almost always the same nodes > are crashing with this error message: > > Hardware Error > CPU0 Machine Check Exception 4 Bank 2 b200200000000863 > TSC 108dd369444 > Processor 2:40f13 Time 1311847912 Socket 0 APIC 0 > MC2-Status: Uncorredted error, report: yes MisV: invalid > CPU context corrupt: yes UECC Error > Bud Unit Error: prefetch/ECC error in data read from NB: local node originated > (SRC) > Transaction type: prefetch (mem access), no timeout, cache level L3/generic. > Participating Processors: local node originated (SRC) > > Judging from this I would guess there is a memory related problem. > Given there are a number of people on the list here and they probably have > seen similar hardware before, do I simply have a bad batch of hardware which > is known to cause problems or do I have a different issue here? What I am after > is some kind of idea of where to look next. It is not the compiled program as > taking out the disc and placing it in a different node (same motherboard, same > Opteron but slightly different flags) does not cause any problems at all. > Given the large number of nodes which causing problems, before I am proposing > to write off these nodes I would like to make sure it is not a subtle issue > like a BIOS upgrade which could cure the problem. > > Many thanks for your help and all the best from London > > J?rg > > ############## > > > > -- > ************************************************************* > J?rg Sa?mannshausen > University College London > Department of Chemistry > Gordon Street > London > WC1H 0AJ > > email: j.sassmannshausen at ucl.ac.uk > web: http://sassy.formativ.net > > Please avoid sending me Word or PowerPoint attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Aug 3 00:28:10 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Wed, 03 Aug 2011 14:28:10 +1000 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release In-Reply-To: <207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2><4DA5E85D.4010801@ats.ucla.edu> <207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4E38CE5A.5080506@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/07/11 18:47, Hearns, John wrote: >> I think a lot of this will apply to non-SGE batch schedulers -- in >> fact Torque will support hwloc in a future release. > > That sounds good to me! > > (Hint - if anyone from Altair is listening in it would be useful...) There's already been Carl Smith from pbspro.com on the hwloc mailing list finding configure problems with AIX (which have been fixed)... cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk44zloACgkQO2KABBYQAh8KUACfd5r45HcKBQdxRdRm3rb42fO1 VbgAoINM9lQ2rCIsa6G9Yv0b2qWii2aC =F/Jm -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Aug 8 21:45:38 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 09 Aug 2011 11:45:38 +1000 Subject: [Beowulf] IBM terminates Blue Waters contract Message-ID: <4E409142.8060900@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NCSA is now looking for a new hardware supplier.. http://www.ncsa.illinois.edu/BlueWaters/system.html # Effective August 6, 2011, IBM terminated its contract # with the University of Illinois to provide the supercomputer # for the National Center for Supercomputing Applications' # Blue Waters project. More info at El Reg: http://www.theregister.co.uk/2011/08/08/ibm_kills_blue_waters_super/ # To date, IBM had shipped three racks of the Blue Waters # supers to NCSA, and these will be returned. IBM has to # give back $30m to NCSA. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5AkUIACgkQO2KABBYQAh8icQCeL9PM2FW6ZAMLKz9Wg55oePGY /FcAoJQGuHMOTNZ0bNddHIAy40ZCe5oB =fID2 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mdidomenico4 at gmail.com Tue Aug 9 08:46:13 2011 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Tue, 9 Aug 2011 08:46:13 -0400 Subject: [Beowulf] Memory Testing? Message-ID: The last discussion on the list about faulty memory surronded using some software like memtest or hpl to trigger SBE. I'm curious if anyone has any experience with ECC uncorrectable errors (specifically not the identification of), but which specific dimm in the chassis it's pointing to. The mcelog in linux doesn't seem to report the dimm slot correctly on my supermicro boards. The only way i know how to narrow it down is to pull all the dimms, and then test one at a time, with the system. I'm curious if there is a better way, or if anyone has any opinions on the below (or another similar) piece of hardware that might do the same http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From a.travis at abdn.ac.uk Tue Aug 9 08:54:14 2011 From: a.travis at abdn.ac.uk (Tony Travis) Date: Tue, 09 Aug 2011 13:54:14 +0100 Subject: [Beowulf] Memory Testing? In-Reply-To: References: Message-ID: <4E412DF6.1080204@abdn.ac.uk> On 09/08/11 13:46, Michael Di Domenico wrote: > [...] > I'm curious if there is a better way, or if anyone has any opinions on > the below (or another similar) piece of hardware that might do the > same > > http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm Hi, Michael. We had a RAM tester back in the day, but memory that it passed still gave errors in the real systems we were using. I screen memory in the system it is installed in using Memtest86+ then run Charles Cazabon's user-mode "Memtester" on the running system to assess its reliability: http://pyropus.ca/software/memtester/ HTH, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Aug 11 08:04:58 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 11 Aug 2011 08:04:58 -0400 (EDT) Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E412DF6.1080204@abdn.ac.uk> References: <4E412DF6.1080204@abdn.ac.uk> Message-ID: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Most of you are probably not aware of this story about trade secrets and Bash scripts on HPC clusters (I was not until a few months ago) http://www.clustermonkey.net//content/view/308/33/ -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Aug 11 10:05:00 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 07:05:00 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: Interesting.. You wrote: There is a general understanding that unless explicitly marked in the contents of the script (the text file that is the Bash program), a Bash script is freely available for use and modification by anyone. In some cases there is a copyright notice or a license that allows (or disallows) sharing or modification. These are always explicitly stated at the beginning of the script and obvious to anyone who reads or modifies the script. This is, of course, not correct under current law, marking is not required for copyright protection. pretty much everything is born copyrighted. Putting markings on it helps you claim for willful infringement (i.e. the recipient can't claim "I didn't know") which helps on the damages situation. And, under the Berne convention, marking is required to assert your rights in some countries (All Rights Reserved is also required in some places) Likewise, under current law, registration of copyright isn't required. Registration allows you to collect statuatory damages for infringement, though. For trade secrets, it's a bit trickier. The recipient has to know that it's trade secret, but that can be done by marking on the delivery media, by a separate document, or even by verbal communication (here, this is proprietary, don't disclose it). And you have to take some means to protect it: claiming something that is trade secret that is printed on bus stop benches won't fly. In any case, just because scripts aren't obfuscated doesn't mean they're not subject to trade secret protection. If the owner of the secret takes some precautions to prevent wide disclosure (e.g. warning the recipient of its proprietary nature). This is the aspect that will surely be the core of litigation: would a "reasonable person" have known that the material was subject to trade secret protection. As we all know, reasonable people differ, and the attorneys on both sides will trot out examples of marking and disclosure practices: good, bad, and indifferent. As Doug noted, "special measur es" need to be taken, but there's no bright line standard for those measures, and, in practice, they can be pretty lax (and would be expected to be proportionate to the value of the secret.. the secret formula for Coke is probably more protected than the schedule for sweeping the floor in the manufacturing plant... both provide competitive advantage to Coke, but one is probably more important) Something that a lot of tech people in industry (particularly those coming from academia and working with open source) probably don't really fully understand is that pretty much everything you do for your employer is probably proprietary in some sense, and there is probably a written policy to that effect, which you, as an employee, are expected to be aware of. Or your supervisor told you, or the nice personnel person told you when you hired in 20 years ago, etc. Mundane operational details of the business might be claimed to provide competitive advantage, especially if they're not "industry standard" (humorously, if the employer has some really lame practice that's horrible, that might make it protectable.. then you could argue in court about whether it had any value). This is why there are "document review" departments and periodic training: It helps reduce the problem of "inadvertent disclosure" and "I didn't know". This is the really tricky thing about trade secret: inadvertent disclosure can ruin the protection. There have been cases of deliberately (and nefariously) "losing" trade secret info to spoil the protection. And then, there is a somewhat notorious case of documents from Intel(?) that were in an envelope at a hotel desk or convention(?) with a person's name on it. Turns out there was a competitor (AMD?) with an employee of the same name, who accidentally got the documents handed to them (Hi, I'm John Smith, I think you have something for me.), opened the envelope, realized the problem, handed them right back, but in later action, it was alleged that this was sufficient to break the protection. I don't recall all the details, and it probably settled out of court. It's really complex.. "the bell, having been rung, cannot be unrung" (the phrase shows up in tons of legal writings), but in reality, if the inadvertent disclosure wasn't too big, etc. Important things: 1) The language it's written in or obfuscation or not makes no difference. 2) the size of the work makes no difference. "Candy/Is dandy/But liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair use, and anyway, the copyright may have expired) 3) the intellectual effort in the work makes no difference (unlike patents, there's no requirement of novelty) (unless you're trying to claim trade secret protection on something that's already public knowledge.. the thing might be public, but the fact that you selected that particular one might be trade secret.) Jim I am not a lawyer, but I spent all too many (hundreds) of hours in depositions and meetings and court where one of the main issues was the "was there adequate notice of the trade secret status of the information" as well as "did they steal it", not to mention the always popular "can you describe the secret with specificity and particularity". If the bad guy steals the trade secret and then keeps it secret, it's fairly hard to show that they actually have it. There are also folks who have developed techniques to evade the restrictions of an NDA ("Sure, I signed it, but that exceeded the scope of my corporate authority, so it's invalid. " "Technically, I wasn't an employee that afternoon, even though I was in the morning, and I was the next week, but hey, for that afternoon, I wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry about giving you that business card with the company name on it, but it was what I happened to have in my wallet") ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Douglas Eadline [deadline at eadline.org] Sent: Thursday, August 11, 2011 05:04 To: beowulf at beowulf.org Subject: [Beowulf] All Your BASH Are Belong To Us Most of you are probably not aware of this story about trade secrets and Bash scripts on HPC clusters (I was not until a few months ago) http://www.clustermonkey.net//content/view/308/33/ -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Thu Aug 11 10:35:01 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Thu, 11 Aug 2011 21:35:01 +0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk> <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: <4E43E895.6070803@pathscale.com> On 08/11/11 07:04 PM, Douglas Eadline wrote: > > Most of you are probably not aware of this story > about trade secrets and Bash scripts on HPC clusters > (I was not until a few months ago) > > http://www.clustermonkey.net//content/view/308/33/ IANAL and this shouldn't be taken as legal advice - Bret Stouder if you haven't done so already contact SFLC immediately. They provide legal services to open source projects and may be able to help. (I can help put you in touch with them or other very good open source legal council.) ./C /* Armchair lawyers are generally not helpful and in many cases it's counterproductive for them to express their own personal views. I hope this discussion dies immediately without further comment */ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Aug 11 12:58:47 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 11 Aug 2011 12:58:47 -0400 (EDT) Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> I had a chance to read some of the depositions, really interesting and even embarrassing stuff. My guess is Atipa got angry when Bret and the other employees left to form a new company. They may have searched for ways to stop them and decided to go after them for what Atipa considered "trade secrets." A more or less traditional method to prevent ex-employees from stealing your secret sauce (as you explain below). The only problem was much of the "secrets" were developed and shared in an open environment. This may have been a surprise to those in charge and makes their claims a bit harder to swallow. (i.e. a fundamental misunderstanding of how trade secrets can be protected in an open source ecosystem). And, what I try to point out in the article, is that this open source ecosystem is what allowed hardware vendors to sell clusters in the first place. There is of course more to this case than I describe in the article. I'll post more as it progresses. -- Doug > Interesting.. You wrote: > There is a general understanding that unless explicitly marked in the > contents of the script (the text file that is the Bash program), a Bash > script is freely available for use and modification by anyone. In some > cases there is a copyright notice or a license that allows (or disallows) > sharing or modification. These are always explicitly stated at the > beginning of the script and obvious to anyone who reads or modifies the > script. > > This is, of course, not correct under current law, marking is not required > for copyright protection. pretty much everything is born copyrighted. > Putting markings on it helps you claim for willful infringement (i.e. the > recipient can't claim "I didn't know") which helps on the damages > situation. And, under the Berne convention, marking is required to assert > your rights in some countries (All Rights Reserved is also required in > some places) Likewise, under current law, registration of copyright isn't > required. Registration allows you to collect statuatory damages for > infringement, though. > > For trade secrets, it's a bit trickier. The recipient has to know that > it's trade secret, but that can be done by marking on the delivery media, > by a separate document, or even by verbal communication (here, this is > proprietary, don't disclose it). And you have to take some means to > protect it: claiming something that is trade secret that is printed on bus > stop benches won't fly. In any case, just because scripts aren't > obfuscated doesn't mean they're not subject to trade secret protection. > If the owner of the secret takes some precautions to prevent wide > disclosure (e.g. warning the recipient of its proprietary nature). This > is the aspect that will surely be the core of litigation: would a > "reasonable person" have known that the material was subject to trade > secret protection. As we all know, reasonable people differ, and the > attorneys on both sides will trot out examples of marking and disclosure > practices: good, bad, and indifferent. As Doug noted, "special measures" > need to be taken, but there's no bright line standard for those measures, > and, in practice, they can be pretty lax (and would be expected to be > proportionate to the value of the secret.. the secret formula for Coke is > probably more protected than the schedule for sweeping the floor in the > manufacturing plant... both provide competitive advantage to Coke, but one > is probably more important) > > Something that a lot of tech people in industry (particularly those > coming from academia and working with open source) probably don't really > fully understand is that pretty much everything you do for your employer > is probably proprietary in some sense, and there is probably a written > policy to that effect, which you, as an employee, are expected to be aware > of. Or your supervisor told you, or the nice personnel person told you > when you hired in 20 years ago, etc. Mundane operational details of the > business might be claimed to provide competitive advantage, especially if > they're not "industry standard" (humorously, if the employer has some > really lame practice that's horrible, that might make it protectable.. > then you could argue in court about whether it had any value). This is why > there are "document review" departments and periodic training: It helps > reduce the problem of "inadvertent disclosure" and "I didn't know". > > > This is the really tricky thing about trade secret: inadvertent disclosure > can ruin the protection. There have been cases of deliberately (and > nefariously) "losing" trade secret info to spoil the protection. And > then, there is a somewhat notorious case of documents from Intel(?) that > were in an envelope at a hotel desk or convention(?) with a person's name > on it. Turns out there was a competitor (AMD?) with an employee of the > same name, who accidentally got the documents handed to them (Hi, I'm John > Smith, I think you have something for me.), opened the envelope, realized > the problem, handed them right back, but in later action, it was alleged > that this was sufficient to break the protection. I don't recall all the > details, and it probably settled out of court. It's really complex.. "the > bell, having been rung, cannot be unrung" (the phrase shows up in tons of > legal writings), but in reality, if the inadvertent disclosure wasn't too > big, etc. > > > Important things: > 1) The language it's written in or obfuscation or not makes no difference. > 2) the size of the work makes no difference. "Candy/Is dandy/But > liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair > use, and anyway, the copyright may have expired) > 3) the intellectual effort in the work makes no difference (unlike > patents, there's no requirement of novelty) (unless you're trying to claim > trade secret protection on something that's already public knowledge.. the > thing might be public, but the fact that you selected that particular one > might be trade secret.) > > > Jim > > I am not a lawyer, but I spent all too many (hundreds) of hours in > depositions and meetings and court where one of the main issues was the > "was there adequate notice of the trade secret status of the information" > as well as "did they steal it", not to mention the always popular "can you > describe the secret with specificity and particularity". If the bad guy > steals the trade secret and then keeps it secret, it's fairly hard to show > that they actually have it. There are also folks who have developed > techniques to evade the restrictions of an NDA ("Sure, I signed it, but > that exceeded the scope of my corporate authority, so it's invalid. " > "Technically, I wasn't an employee that afternoon, even though I was in > the morning, and I was the next week, but hey, for that afternoon, I > wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry > about giving you that business card with the company name on it, but it > was what I happened to have in my wallet") > > > > ________________________________________ > From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf > Of Douglas Eadline [deadline at eadline.org] > Sent: Thursday, August 11, 2011 05:04 > To: beowulf at beowulf.org > Subject: [Beowulf] All Your BASH Are Belong To Us > > Most of you are probably not aware of this story > about trade secrets and Bash scripts on HPC clusters > (I was not until a few months ago) > > http://www.clustermonkey.net//content/view/308/33/ > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Aug 11 13:40:35 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 10:40:35 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> Message-ID: > -----Original Message----- > From: Douglas Eadline [mailto:deadline at eadline.org] > Sent: Thursday, August 11, 2011 9:59 AM > To: Lux, Jim (337C) > Cc: beowulf at beowulf.org > Subject: RE: [Beowulf] All Your BASH Are Belong To Us > > > I had a chance to read some of the depositions, really interesting > and even embarrassing stuff. My guess is Atipa got angry when > Bret and the other employees left to form a new company. They > may have searched for ways to stop them and decided > to go after them for what Atipa considered "trade secrets." > A more or less traditional method to prevent ex-employees from > stealing your secret sauce (as you explain below). > > The only problem was much of the "secrets" were developed > and shared in an open environment. This may have been a > surprise to those in charge and makes their claims > a bit harder to swallow. (i.e. a fundamental misunderstanding > of how trade secrets can be protected in an open source ecosystem). > And, what I try to point out in the article, is that this > open source ecosystem is what allowed hardware vendors to > sell clusters in the first place. > > There is of course more to this case than I describe in the article. > I'll post more as it progresses. > > -- Yes.. and a standard way to attempt to do a "non-compete" (which are typically illegal in California) is for the former employer to threaten the new employer (or customers of the spin-off) with the "theft of trade secrets" allegation. Even if the allegation is unfounded, you have to spend time and money dealing with it (if you're the ex-employee) or it creates sufficient fear, uncertainty, and doubt (on the part of the customers of the ex-employee spin off). I'm also not so na?ve as to think that employees don't actually take trade secrets with them and use them, so it's not entirely improbable. But, in a perfect world, there would be substantial sanctions for doing this kind of thing as a competitive maneuver. Legal niceties aside, Doug brings up an interesting point about "trade secrets" or intellectual property in general... You work at a job and become experienced and knowledgeable in a particular line of business. How much of that is "general knowledge" (not protectable) and how much is "peculiar to the employer" (protectable)? This is a pretty fuzzy thing. A for instance.. say you leaned over to the next cube and asked someone for help formulating a particularly complex command line to grep a file. The exact, character for character version of that command line probably belongs to the employer, but what about the knowledge you now have of how to do those kinds of searches? What if your coworker had actually done the command line (in its exact form) at some other place and brought it with them? Then, there's the practical details of getting approval from a (conservative) power-that-is. Sure, you might have gotten it from open source, but will your corporate reviewer agree? Or, will they use the default "it's all proprietary unless proven otherwise, and we don't have time to look at your proof, and you don't have time to be gathering the proof". It's really depends on a corporate/organizational commitment to open source to institute processes to keep all this stuff straight. (and we won't even get into "open source" vs "able to redistribute") _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Aug 11 13:53:55 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 11 Aug 2011 13:53:55 -0400 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> Message-ID: <4E441733.80701@scalableinformatics.com> On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote: > It's really depends on a corporate/organizational commitment to open > source to institute processes to keep all this stuff straight. (and > we won't even get into "open source" vs "able to redistribute") There are profoundly incorrect views running around out there, as to what "open source" means. I had someone tell me that GPLv2 prevented distribution of binaries (it doesn't). I've watched people slap additional legal bits in conflict with GPL onto GPL source. I don't want to say "its a mess" but I do want to say that "there is a profound need for a very simple statement of what is and isn't allowed by each license." Including what is involved in altering licensing. While these are more or less amusing and some won't really result in court cases and precedents, there is at least one effort that has some nice potential to test GPL. See the zfs on linux systems. c.f. http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue I can't imagine this will end well for any company shipping this, in source, build script, or binary form. CDDL aside, Oracle's got some IP claims they could file, as well as other things. I can't believe that shipping NetBSD binaries with Oracle IP inside would end well either. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Aug 11 14:19:00 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 11:19:00 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E441733.80701@scalableinformatics.com> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> Message-ID: > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman > Sent: Thursday, August 11, 2011 10:54 AM > To: beowulf at beowulf.org > Subject: Re: [Beowulf] All Your BASH Are Belong To Us > > On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote: > > > It's really depends on a corporate/organizational commitment to open > > source to institute processes to keep all this stuff straight. (and > > we won't even get into "open source" vs "able to redistribute") > > There are profoundly incorrect views running around out there, as to > what "open source" means. I had someone tell me that GPLv2 prevented > distribution of binaries (it doesn't). I've watched people slap > additional legal bits in conflict with GPL onto GPL source. > > I don't want to say "its a mess" but I do want to say that "there is a > profound need for a very simple statement of what is and isn't allowed > by each license." Including what is involved in altering licensing. > Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses. They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems. It's been a royal pain for me trying to release stuff to the general public in a useful form. It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run. But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements. What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it. The complication is that in general, work funded by NASA and performed by government employees is a "government work not subject to copyright" although work funded by NASA and performed by an educational institution (e.g. JPL, which is part of Cal Tech) is subject to Bayh-Dole, and is presumed to be owned by the educational institution, with a fully paid, non-exclusive license granted to the government for government purposes. (there is, of course, litigation about what those "government purposes" might happen to be). The incompatibility arises because NASA is legally obligated to distribute their products with no downstream restrictions on use, which is not the same as, for instance, GPL, which imposes restrictions on downstream use. NASA (and the government in general) doesn't care if someone takes their product and uses it to make a subsequent closed source product which is totally proprietary. (and in fact, NASTRAN would be a fine example of this) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Aug 11 15:28:28 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 11 Aug 2011 15:28:28 -0400 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> Message-ID: <4E442D5C.1030902@ldeo.columbia.edu> Lux, Jim (337C) wrote: >> -----Original Message----- >> From: Douglas Eadline [mailto:deadline at eadline.org] >> Sent: Thursday, August 11, 2011 9:59 AM >> To: Lux, Jim (337C) >> Cc: beowulf at beowulf.org >> Subject: RE: [Beowulf] All Your BASH Are Belong To Us >> >> >> I had a chance to read some of the depositions, really interesting >> and even embarrassing stuff. My guess is Atipa got angry when >> Bret and the other employees left to form a new company. They >> may have searched for ways to stop them and decided >> to go after them for what Atipa considered "trade secrets." >> A more or less traditional method to prevent ex-employees from >> stealing your secret sauce (as you explain below). >> >> The only problem was much of the "secrets" were developed >> and shared in an open environment. This may have been a >> surprise to those in charge and makes their claims >> a bit harder to swallow. (i.e. a fundamental misunderstanding >> of how trade secrets can be protected in an open source ecosystem). >> And, what I try to point out in the article, is that this >> open source ecosystem is what allowed hardware vendors to >> sell clusters in the first place. >> >> There is of course more to this case than I describe in the article. >> I'll post more as it progresses. >> >> -- > > > Yes.. and a standard way to attempt to do a "non-compete" > (which are typically illegal in California) is for the former employer > to threaten the new employer (or customers of the spin-off) with the > "theft of trade secrets" allegation. Even if the allegation is unfounded, > you have to spend time and money dealing with it > (if you're the ex-employee) or it creates sufficient fear, > uncertainty, and doubt (on the part of the customers > of the ex-employee spin off). > Very true, and in the arena of intimidating former employees and their current employers/competitors, there is nothing special about the privatization of shell scripts or of nifty regular expressions to grep files. Recent examples include fields perhaps more lucrative than HPC, such as English muffins (Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella): http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm and high frequency trading (isn't it HPC also?) (Goldman Sachs vs. Sergey Aleynikov): http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html What is interesting is that across the board the thing that free entrepreneurs seem to hate the most is their competitors free entrepreneurship. > I'm also not so na?ve as to think that employees don't actually take > trade secrets with them and use them, so it's not entirely improbable. > > But, in a perfect world, there would be substantial sanctions for doing this kind of thing as a competitive maneuver. > > > Legal niceties aside, Doug brings up an interesting point about "trade secrets" or intellectual property in general... > > You work at a job and become experienced and knowledgeable in a > particular line of business. How much of that is "general knowledge" > (not protectable) and how much is "peculiar to the employer" (protectable)? This is a pretty fuzzy thing. > > A for instance.. say you leaned over to the next cube and asked > someone for help formulating a particularly complex command line to > grep a file. The exact, character for character version of that > command line probably belongs to the employer, but what about the > knowledge you now have of how to do those kinds of searches? > What if your coworker had actually done the command line > (in its exact form) at some other place and brought it with them? > > Then, there's the practical details of getting approval from a > (conservative) power-that-is. Sure, you might have gotten it from > open source, but will your corporate reviewer agree? Or, will they > use the default "it's all proprietary unless proven otherwise, and > we don't have time to look at your proof, and you don't have time > to be gathering the proof". > > It's really depends on a corporate/organizational commitment to > open source to institute processes to keep all this stuff straight. > (and we won't even get into "open source" vs "able to redistribute") > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Aug 11 15:57:43 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 11 Aug 2011 15:57:43 -0400 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E442D5C.1030902@ldeo.columbia.edu> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E442D5C.1030902@ldeo.columbia.edu> Message-ID: <4E443437.9070705@scalableinformatics.com> On 08/11/2011 03:28 PM, Gus Correa wrote: > Recent examples include fields perhaps more lucrative than HPC, > such as English muffins > (Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella): > > http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm That muffin just got real ... > > and high frequency trading (isn't it HPC also?) (Goldman Sachs vs. > Sergey Aleynikov): > > http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html > > What is interesting is that across the board > the thing that free entrepreneurs seem > to hate the most is their competitors free entrepreneurship. I am running into an internal parser error in attempting to understand this last sentence. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Thu Aug 11 19:56:02 2011 From: mathog at caltech.edu (David Mathog) Date: Thu, 11 Aug 2011 16:56:02 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Since this is very OT, I'll try to keep it short. Here is the problem - imagine a group of people who neither know nor trust each other, yet must agree on the fairness of a single random number. Basically they are going to have a lottery. They aren't organized enough to generate such a number themselves - it must be found from some process already active on the web, and be so obviously "fair" that they won't argue about that. Everybody must be able to obtain it freely from a web connection. Can any of you think of a source on the web for a set of small files with these properties: 1. from a trusted source (here this mostly means the data is generated for some other innocuous purpose) 2. represents a largely random process (temperature readings, stock market values, etc.) with a set generated at known intervals, preferably daily (at least M-F) 3. are never, ever, revised 4. are distributed reliably (for instance, signed files) 5. are publicly and freely available 6. can be obtained reliably (is available from many sites) So far I have looked at stock market values and weather data - without much luck. You would think the S&P 500 is the S&P 500 and one could look it up on any site and get the same data. Not so! Check the Yahoo and Google financial sites for the first few weeks of Jan. 2011 and you will find digits that differ between the two sites in every single column. Not every day mind you, but often enough that it isn't reliable. Heck, the volume numbers differ by large factors between the two sites. So just choose one site and go with that? Not so fast - if the single source goes down the data is unavailable, and there is no guarantee that the site (which is not party to this particular use of their data) might not revise the page or choose to block it entirely. Or weather data, right? Lots of random bits there and we trust NOAA. But good luck with criteria 3-6. In particular, they don't give data out for free. In theory no US Government site should, since they are supposed to charge to recover distribution costs. Criteria 4-6 are typical of software distributed on mirror sites, but so far I have not found any physical measurements which are distributed in a similar manner. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mdidomenico4 at gmail.com Thu Aug 11 20:28:11 2011 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Thu, 11 Aug 2011 20:28:11 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: How many random numbers per day are you expecting? If everyone checks at exactly 1pm, should they all see the same "random" number or should they each get their own "random" number? What kind of entropy are you expecting on "random"? On Thu, Aug 11, 2011 at 7:56 PM, David Mathog wrote: > Since this is very OT, I'll try to keep it short. > > Here is the problem - imagine a group of people who neither know nor > trust each other, yet must agree on the fairness of a single random > number. ?Basically they are going to have a lottery. ?They aren't > organized enough to generate such a number themselves - it must be found > from some process already active on the web, and be so obviously "fair" > that they won't argue about that. ?Everybody must be able to obtain it > freely from a web connection. > > Can any of you think of a source on the web for a set of small files > with these properties: > > 1. ?from a trusted source (here this mostly means the data is generated > ? ?for some other innocuous purpose) > 2. ?represents a largely random process (temperature readings, > ? ?stock market values, etc.) with a set generated at known intervals, > ? ?preferably daily (at least M-F) > 3. ?are never, ever, revised > 4. ?are distributed reliably (for instance, signed files) > 5. ?are publicly and freely available > 6. ?can be obtained reliably (is available from many sites) > > So far I have looked at stock market values and weather data - without > much luck. > > You would think the S&P 500 is the S&P 500 and one could look it up on > any site and get the same data. ?Not so! Check the Yahoo and Google > financial sites for the first few weeks of Jan. 2011 and you will find > digits that differ between the two sites in every single column. ?Not > every day mind you, but often enough that it isn't reliable. ?Heck, the > volume numbers differ by large factors between the two sites. ?So just > choose one site and go with that? ?Not so fast - if the single source > goes down the data is unavailable, and there is no guarantee that the > site (which is not party to this particular use of their data) might not > revise the page or choose to block it entirely. > > Or weather data, right? ?Lots of random bits there and we trust NOAA. > But good luck with criteria 3-6. ?In particular, they don't give data > out for free. ?In theory no US Government site should, since they are > supposed to charge to recover distribution costs. > > Criteria 4-6 are typical of software distributed on mirror sites, but so > far I have not found any physical measurements which are distributed in > a similar manner. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From peter.st.john at gmail.com Thu Aug 11 20:44:15 2011 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 11 Aug 2011 20:44:15 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: David, I was thinking the National Weather Service, instead of NOAA; it's a vital public service that such information is recorded and diseminated for airfields and the like, e.g.: http://www.weather.gov/climate/getclimate.php?wfo=bou So I would write a script to scrape least significant digits from that, for agreed times, dates, and locations. Whoever writes the script and wherever it is run, anyone can check its results manually. However, that item has a disclaimer that the data is subject to review :) So it may matter how far back in time you need to be able to go, and how long into the future you need the data to be available at the same place. But nobody promises their website will stay unchanged indefinitely, they can't. But at any given time, a group can agree on (say) the lowest significant digits of the temperatures at time T in cities X, Y, and Z as reported at time T2 by the NWS. Peter On Thu, Aug 11, 2011 at 7:56 PM, David Mathog wrote: > Since this is very OT, I'll try to keep it short. > > Here is the problem - imagine a group of people who neither know nor > trust each other, yet must agree on the fairness of a single random > number. Basically they are going to have a lottery. They aren't > organized enough to generate such a number themselves - it must be found > from some process already active on the web, and be so obviously "fair" > that they won't argue about that. Everybody must be able to obtain it > freely from a web connection. > > Can any of you think of a source on the web for a set of small files > with these properties: > > 1. from a trusted source (here this mostly means the data is generated > for some other innocuous purpose) > 2. represents a largely random process (temperature readings, > stock market values, etc.) with a set generated at known intervals, > preferably daily (at least M-F) > 3. are never, ever, revised > 4. are distributed reliably (for instance, signed files) > 5. are publicly and freely available > 6. can be obtained reliably (is available from many sites) > > So far I have looked at stock market values and weather data - without > much luck. > > You would think the S&P 500 is the S&P 500 and one could look it up on > any site and get the same data. Not so! Check the Yahoo and Google > financial sites for the first few weeks of Jan. 2011 and you will find > digits that differ between the two sites in every single column. Not > every day mind you, but often enough that it isn't reliable. Heck, the > volume numbers differ by large factors between the two sites. So just > choose one site and go with that? Not so fast - if the single source > goes down the data is unavailable, and there is no guarantee that the > site (which is not party to this particular use of their data) might not > revise the page or choose to block it entirely. > > Or weather data, right? Lots of random bits there and we trust NOAA. > But good luck with criteria 3-6. In particular, they don't give data > out for free. In theory no US Government site should, since they are > supposed to charge to recover distribution costs. > > Criteria 4-6 are typical of software distributed on mirror sites, but so > far I have not found any physical measurements which are distributed in > a similar manner. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Aug 11 20:55:30 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 17:55:30 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: Low order digits from weather stations are not likely to be random. They're almost certainly converted from some quantized converter, and may actually have a double conversion (Celsius Fahrenheit) NWS and NOAA are actually part of the same organization, aren't they. (since the NWS web page at weather.gov is titled "NOAA's National Weather Service") Jim Lux +1(818)354-2075 From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Peter St. John Sent: Thursday, August 11, 2011 5:44 PM To: David Mathog Cc: beowulf at beowulf.org Subject: Re: [Beowulf] OT: public random numbers? David, I was thinking the National Weather Service, instead of NOAA; it's a vital public service that such information is recorded and diseminated for airfields and the like, e.g.: http://www.weather.gov/climate/getclimate.php?wfo=bou So I would write a script to scrape least significant digits from that, for agreed times, dates, and locations. Whoever writes the script and wherever it is run, anyone can check its results manually. However, that item has a disclaimer that the data is subject to review :) So it may matter how far back in time you need to be able to go, and how long into the future you need the data to be available at the same place. But nobody promises their website will stay unchanged indefinitely, they can't. But at any given time, a group can agree on (say) the lowest significant digits of the temperatures at time T in cities X, Y, and Z as reported at time T2 by the NWS. Peter On Thu, Aug 11, 2011 at 7:56 PM, David Mathog > wrote: Since this is very OT, I'll try to keep it short. Here is the problem - imagine a group of people who neither know nor trust each other, yet must agree on the fairness of a single random number. Basically they are going to have a lottery. They aren't organized enough to generate such a number themselves - it must be found from some process already active on the web, and be so obviously "fair" that they won't argue about that. Everybody must be able to obtain it freely from a web connection. Can any of you think of a source on the web for a set of small files with these properties: 1. from a trusted source (here this mostly means the data is generated for some other innocuous purpose) 2. represents a largely random process (temperature readings, stock market values, etc.) with a set generated at known intervals, preferably daily (at least M-F) 3. are never, ever, revised 4. are distributed reliably (for instance, signed files) 5. are publicly and freely available 6. can be obtained reliably (is available from many sites) So far I have looked at stock market values and weather data - without much luck. You would think the S&P 500 is the S&P 500 and one could look it up on any site and get the same data. Not so! Check the Yahoo and Google financial sites for the first few weeks of Jan. 2011 and you will find digits that differ between the two sites in every single column. Not every day mind you, but often enough that it isn't reliable. Heck, the volume numbers differ by large factors between the two sites. So just choose one site and go with that? Not so fast - if the single source goes down the data is unavailable, and there is no guarantee that the site (which is not party to this particular use of their data) might not revise the page or choose to block it entirely. Or weather data, right? Lots of random bits there and we trust NOAA. But good luck with criteria 3-6. In particular, they don't give data out for free. In theory no US Government site should, since they are supposed to charge to recover distribution costs. Criteria 4-6 are typical of software distributed on mirror sites, but so far I have not found any physical measurements which are distributed in a similar manner. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From samuel at unimelb.edu.au Thu Aug 11 21:54:11 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 12 Aug 2011 11:54:11 +1000 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk> <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: <4E4487C3.60605@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/08/11 22:04, Douglas Eadline wrote: > Most of you are probably not aware of this story > about trade secrets and Bash scripts on HPC clusters On the copyright side of things (not the trade secret stuff), my understanding (IANAL, etc) is that anything you create you[0] hold copyright on[1], and for someone else to copy it they must have some agreement (license) to be able to do so. Thus a shell script with no license attached or embedded is copyrighted and you should get explicit permission to use it.. cheers, Chris [0] - where "you" is the entity that is the copyright holder, not necessarily the creator. [1] - yes, I know there are some entities that aren't allowed to hold copyright.. :-) - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5Eh8MACgkQO2KABBYQAh8sOgCePl6n4UTNZGMAePc8Kb+kmK4a DHwAoJeVgYKUMDpJe78/2mQqbL2ryJ4M =UAan -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Thu Aug 11 23:22:34 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Fri, 12 Aug 2011 10:22:34 +0700 Subject: [Beowulf] Open source @NASA - WAS: OT In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> Message-ID: <4E449C7A.1030102@pathscale.com> On 08/12/11 01:19 AM, Lux, Jim (337C) wrote: > > Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses. They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems. It's been a royal pain for me trying to release stuff to the general public in a useful form. It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run. But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements. What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it. Hi Jim, For this exact problem you've described an ebuild could be a very good solution. (I've personally abandoned gentoo a long time ago) By solution I mean bash script that explicitly checks the hashes, resolves the deps and pulls the source to build everything from the eleventy-seven URLs and FTP sites. The people working with gentoo-science would likely appreciate it a lot. (The learning curve is fairly low if you know bash already) -------- With regards to open source license proliferation and incompatibilities. I think most people in the community are working towards streamlining, but changes after-the-fact can be difficult/impossible. I'm empathetic to your situation and I'd say work towards getting your projects merged with something like gentoo to start and then maybe something like OpenSuSE build service. This would cover a very large % of the packaging/distribution problem and get it in the hands of users easily. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Thu Aug 11 23:51:12 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 12 Aug 2011 13:51:12 +1000 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> Message-ID: <4E44A330.7090503@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/11 04:19, Lux, Jim (337C) wrote: > The incompatibility arises because NASA is legally > obligated to distribute their products with no > downstream restrictions on use, Actually no - the NASA license is incompatible with the GPL (at least) because: http://www.gnu.org/licenses/license-list.html # The NASA Open Source Agreement, version 1.3, is not # a free software license because it includes a provision # requiring changes to be your ?original creation?. Free # software development depends on combining code from # third parties, and the NASA license doesn't permit this. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9 +iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT =GL6d -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Aug 11 23:57:03 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 20:57:03 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E44A330.7090503@unimelb.edu.au> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> , <4E44A330.7090503@unimelb.edu.au> Message-ID: Yes, that too... ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Christopher Samuel [samuel at unimelb.edu.au] Sent: Thursday, August 11, 2011 20:51 To: beowulf at beowulf.org Subject: Re: [Beowulf] All Your BASH Are Belong To Us -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/11 04:19, Lux, Jim (337C) wrote: > The incompatibility arises because NASA is legally > obligated to distribute their products with no > downstream restrictions on use, Actually no - the NASA license is incompatible with the GPL (at least) because: http://www.gnu.org/licenses/license-list.html # The NASA Open Source Agreement, version 1.3, is not # a free software license because it includes a provision # requiring changes to be your ?original creation?. Free # software development depends on combining code from # third parties, and the NASA license doesn't permit this. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9 +iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT =GL6d -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 00:31:30 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 00:31:30 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Thu, 11 Aug 2011, David Mathog wrote: > Since this is very OT, I'll try to keep it short. > > Here is the problem - imagine a group of people who neither know nor > trust each other, yet must agree on the fairness of a single random > number. Basically they are going to have a lottery. They aren't > organized enough to generate such a number themselves - it must be found > from some process already active on the web, and be so obviously "fair" > that they won't argue about that. Everybody must be able to obtain it > freely from a web connection. http://www.random.org/ sincerely, rgb > > Can any of you think of a source on the web for a set of small files > with these properties: > > 1. from a trusted source (here this mostly means the data is generated > for some other innocuous purpose) > 2. represents a largely random process (temperature readings, > stock market values, etc.) with a set generated at known intervals, > preferably daily (at least M-F) > 3. are never, ever, revised > 4. are distributed reliably (for instance, signed files) > 5. are publicly and freely available > 6. can be obtained reliably (is available from many sites) > > So far I have looked at stock market values and weather data - without > much luck. > > You would think the S&P 500 is the S&P 500 and one could look it up on > any site and get the same data. Not so! Check the Yahoo and Google > financial sites for the first few weeks of Jan. 2011 and you will find > digits that differ between the two sites in every single column. Not > every day mind you, but often enough that it isn't reliable. Heck, the > volume numbers differ by large factors between the two sites. So just > choose one site and go with that? Not so fast - if the single source > goes down the data is unavailable, and there is no guarantee that the > site (which is not party to this particular use of their data) might not > revise the page or choose to block it entirely. > > Or weather data, right? Lots of random bits there and we trust NOAA. > But good luck with criteria 3-6. In particular, they don't give data > out for free. In theory no US Government site should, since they are > supposed to charge to recover distribution costs. > > Criteria 4-6 are typical of software distributed on mirror sites, but so > far I have not found any physical measurements which are distributed in > a similar manner. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 11:21:37 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 08:21:37 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Robert G. Brown wrote: > Everybody must be able to obtain it > > freely from a web connection. > > http://www.random.org/ > Nice site. They have something that is very close, the pregenerated random files, from which a small set of digits may be extracted, and the files themselves have MD5 checksums (but are not signed). They also support https. It comes up a little short on criteria 1 (we really don't know what is going on behind the scenes) and 6 (it is a single site.) Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Aug 12 11:26:05 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 12 Aug 2011 11:26:05 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: <4E45460D.9040505@scalableinformatics.com> On 08/12/2011 11:21 AM, David Mathog wrote: > Robert G. Brown wrote: > >> Everybody must be able to obtain it >>> freely from a web connection. >> >> http://www.random.org/ And from SGI days ... http://www.lavarnd.org/ > Nice site. They have something that is very close, the pregenerated > random files, from which a small set of digits may be extracted, and the > files themselves have MD5 checksums (but are not signed). > They also support https. It comes up a little short on criteria 1 (we > really don't know what is going on behind the scenes) and 6 (it is a > single site.) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 11:58:28 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 08:58:28 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Peter St. John wrote: > But at any given time, a group can agree on (say) the lowest significant > digits of the temperatures at time T in cities X, Y, and Z as reported at > time T2 by the NWS. Actually we don't know that, at least not reliably enough for this purpose. It may be that the one web address is actually multiple servers, and if the NWS pushes out data revisions these could return different results for T:X,Y,Z at T2 if the servers were not strictly synchronized. Never mind the caching problems that revisions like this would create on browsers. I have no idea if the NWS revises their data files, but it would not be surprising if they did. After posting I thought of one other source of more or less random verifiable numbers - the scores of sporting events. These are not always generated every day, and are seasonal for the various sports. They are however highly verifiable and when multiple events are grouped, pretty much impossible to "fix" to preselected digits. For instance: http://www.nfl.com/scores http://mlb.mlb.com/mlb/scoreboard http://scores.espn.go.com/nba/scoreboard?date=20110304 These sites maintain historical records. Even if they didn't the scores are widely published, and there are tens of thousands of witnesses to the original event, so it would be pretty much impossible to intentionally change a final score. There could still be copying/typo errors from site to site though, but if such an error was discovered it would be easy enough to resolve. There is no intrinsic order to the scores, and some scheduled games might be canceled, so it would have to be something like "sort the scores from all NBA teams who played on 4/4/11 into ascending order and concatenate the digits". Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Fri Aug 12 12:09:46 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Fri, 12 Aug 2011 09:09:46 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: All nice suggestions, but I wonder if they're truly random. Scores of games have underlying patterns from the "rules of the game" (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points. basketball goals are 2 or 3 points, etc.) I'm sure someone has analyzed this. I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller). What about using random.org and it being backed-up on archive.org? Does that give you the "multiple independent sites" desired? ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu] Sent: Friday, August 12, 2011 08:58 To: Peter St. John; beowulf at beowulf.org Subject: Re: [Beowulf] OT: public random numbers? Peter St. John wrote: > But at any given time, a group can agree on (say) the lowest significant > digits of the temperatures at time T in cities X, Y, and Z as reported at > time T2 by the NWS. Actually we don't know that, at least not reliably enough for this purpose. It may be that the one web address is actually multiple servers, and if the NWS pushes out data revisions these could return different results for T:X,Y,Z at T2 if the servers were not strictly synchronized. Never mind the caching problems that revisions like this would create on browsers. I have no idea if the NWS revises their data files, but it would not be surprising if they did. After posting I thought of one other source of more or less random verifiable numbers - the scores of sporting events. These are not always generated every day, and are seasonal for the various sports. They are however highly verifiable and when multiple events are grouped, pretty much impossible to "fix" to preselected digits. For instance: http://www.nfl.com/scores http://mlb.mlb.com/mlb/scoreboard http://scores.espn.go.com/nba/scoreboard?date=20110304 These sites maintain historical records. Even if they didn't the scores are widely published, and there are tens of thousands of witnesses to the original event, so it would be pretty much impossible to intentionally change a final score. There could still be copying/typo errors from site to site though, but if such an error was discovered it would be easy enough to resolve. There is no intrinsic order to the scores, and some scheduled games might be canceled, so it would have to be something like "sort the scores from all NBA teams who played on 4/4/11 into ascending order and concatenate the digits". Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 13:04:43 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 10:04:43 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: > All nice suggestions, but I wonder if they're truly random. Random enough in this case - as they are only used to form a seed for a random number generator, and a seed is only needed "rarely". So even though pro basketball scores have definite trends and often look like (101,95),(103,87),(98,76), these can still create a decent seed value once sorted and concatenated: 10310198958776 (Lets assume the seed need not be odd.) > What about using random.org and it being backed-up on archive.org? Does that give you the "multiple independent sites" desired? To some degree, but not as much as the large number of sites that distribute game scores and stock values. I originally favored using stock values until it turned out that those numbers are squishier than one might have expected, particularly so for indices like the S&P 500 and Dow Jones. A fellow who works at S&P told me that the opening prices are prone to timing problems, since at T=0+delta some of the issues in the index will have traded, and some will not, with the untraded stock values being filled in with stale values. I think similar timing issues affect all the other index values too (high/low/close). In these cases, since the index is derived from formulas, some sites may be independently calculating the values, and tiny differences in the times the stock values are measured result in different numbers. All it takes is one trade difference between the sample points to change some digits. When I get some time I still need to look and see if the high/low/close values for individual stocks are also variable from web site to web site. These numbers might be more reliable for single stocks since they might all trace back to the data feed from the exchange where the issue trades. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 13:22:46 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 10:22:46 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Michael Di Domenico wrote: > How many random numbers per day are you expecting? One would be sufficient. > If everyone checks at exactly 1pm, should they all see the same > "random" number or should they each get their own "random" number? They should all see the same number. Example: a random number based on physical events which occurred on 8/10/11 would become available on or shortly after that day. Starting from the time it first becomes available, and going forward ideally forever, everybody who wants to should be able to retrieve that same random number. That is, nobody should be able to predict the number before hand, and everybody should be able to verify it later. So the number must be both random and etched in stone. > What kind of entropy are you expecting on "random"? In practice relatively little is needed, 16 bits should be plenty. (More wouldn't hurt, of course.) Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 14:35:17 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 14:35:17 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Fri, 12 Aug 2011, David Mathog wrote: > Robert G. Brown wrote: > >> Everybody must be able to obtain it >>> freely from a web connection. >> >> http://www.random.org/ >> > > Nice site. They have something that is very close, the pregenerated > random files, from which a small set of digits may be extracted, and the > files themselves have MD5 checksums (but are not signed). > They also support https. It comes up a little short on criteria 1 (we > really don't know what is going on behind the scenes) and 6 (it is a > single site.) Behind the scenes is documented pretty well on the site, and the guy who runs it is a human being, you can communicate with him to learn even more. I already know him a bit, as he and I have collaborated on applying dieharder to test random.org datasets -- even "the" random.org dataset as of some time ago (I have a few hundred MB of random number from the site in my dieharder directory). IIRC, the numbers are generated continuously and fairly slowly by grabbing and filtering and transforming atmospheric noise. As a source of entropy, that is probably excellent if (as noted) slow, but many good sources of entropy seem to be fairly slow. He has good reason to think that his numbers are theoretically "true random numbers" -- both unpredictable and flat/decorrelated at all orders, and even though there aren't really enough of them for my purposes, I've used them as one of the (small) "gold standard" sources for testing dieharder even as I test them. For all practical purposes threefish or aes are truly random as well and they are a lot faster and easier to use as gold standard generators, though. I don't quite understand why the single site restriction is important -- this site has been up for years and I don't expect it to go away soon; it is quite reliable. I don't think there is anything secret about how the numbers are generated, and I'll certify that the numbers it produces don't make dieharder unhappy. So 1 is fixable with a bit of effort on your part; 6 I don't really understand but the guy who runs the site is clearly willing to construct a custom feed for cash customers, if there is enough value in whatever it is you are trying to do to pay for access. If it's just a lottery, well, lord, I can think of a dozen ways to make numbers so random that they'd be unimpeachable for any sort of lottery, both unpredictable and uncorrelated, and they don't any of them require any significant amount of entropy to get started. I will add one warning -- "randomness" is a rather stringent mathematical criterion, and is generally tested against the null hypothesis. Amateurs who want to make random number generators out of supposedly "random" data streams or fancy algorithms almost invariably fail, sometimes spectacularly so. There are a half dozen or more really, really good pseudorandom number generators out there and it is easy to hotwire them together into an xor-based high entropy stream that basically never repeats (feeding it a bit of real entropy now and then as it operates). I would strongly counsel you against trying to take e.g. weather data and make something "random" out of it. Unless you really know what you are doing, you will probably make something that isn't at all random and may not even be unpredictable. Even most sources of "quantum" randomness (which is at least possibly "truly random", although I doubt it) aren't flat, so that they carry the signature of their generation process unless/until you manage to transform them into something flat (difficult unless you KNOW the distribution they are producing). Pseudorandom number generators have the serious advantage of being amenable to at least some theoretical analysis (so you can "guarantee" flatness out to some high dimensionality, say) as well as empirical testing with e.g. dieharder. HTH, rgb > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 14:40:36 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 14:40:36 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <4E45460D.9040505@scalableinformatics.com> References: <4E45460D.9040505@scalableinformatics.com> Message-ID: On Fri, 12 Aug 2011, Joe Landman wrote: > On 08/12/2011 11:21 AM, David Mathog wrote: >> Robert G. Brown wrote: >> >>> Everybody must be able to obtain it >>>> freely from a web connection. >>> >>> http://www.random.org/ > > And from SGI days ... http://www.lavarnd.org/ Yeah, like that. Notice the work they have to do to make a not-really-random or only partially-random source flat, unpredictable, random. What they do is probably overkill -- nobody on earth could detect a deviation from randomness if they did only half of their folding and retransformation with crypto grade prngs, but it is still a pretty reliable scheme. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 14:59:59 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 14:59:59 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Fri, 12 Aug 2011, Lux, Jim (337C) wrote: > All nice suggestions, but I wonder if they're truly random. > > Scores of games have underlying patterns from the "rules of the game" (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points. basketball goals are 2 or 3 points, etc.) > > I'm sure someone has analyzed this. > > I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller). > > What about using random.org and it being backed-up on archive.org? Does that give you the "multiple independent sites" desired? As I said and repeat, nothing like this is at all random. Random is stuff like thermal noise, shot noise, quantum noise, and even all of those things are distributed and not flat and require massaging to make into uniform deviates or random bits. Unpredictable is easy, of course -- flip a coin, roll some dice -- until you need to make it >>rigorously<< unpredictable and >>rigorously<< uncorrelated, at which point you need to not screw around with weather, scores, market closing values, even "randomly sampled" ticks of a nanosecond clock aren't that random without some work to make them so. I liked the lavarnd site, and I like random.org. Hell, tap into both of their streams, they're both practically perfect as sources of random numbers go, and it gives you your redundancy and you can xor their streams together to get yet another irrelevant and probably unnecessary degree of lack of correlation. Even if one stream is subtley correlated and the other is too, the chances of the correlations "matching" and persisting through an xor process are astronomical. But then, finding correlations in the output of a properly seeded crypto prng is pretty astronomically unlikely BEFORE you xor-fold it stream-wise a few dozen times into a source of real entropy like atmospheric noise or electro-optical noise. If you want something better, you'll probably have to explain your application in a bit more detail. Do you need rigorously random and flat numbers, or just something unpredictable? The latter is cheap and easy and can be done in the privacy of your own home by reading from /dev/random or /dev/urandom (or perhaps from Intel's new on-CPU rngs). The former requires theory and some work and some heavy duty empirical testing. Just remember, numbers are not random. Numbers are numbers. The number 7 could be "random" or not not by its nature but by how the 7 was generated. Processes, in other words, are (approximately, oxymoronically) random. If you want random numbers, find a (mathematically provably) "random" process, at least to some order and for some purposes... rgb > > > ________________________________________ > From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu] > Sent: Friday, August 12, 2011 08:58 > To: Peter St. John; beowulf at beowulf.org > Subject: Re: [Beowulf] OT: public random numbers? > > Peter St. John wrote: >> But at any given time, a group can agree on (say) the lowest significant >> digits of the temperatures at time T in cities X, Y, and Z as reported at >> time T2 by the NWS. > > Actually we don't know that, at least not reliably enough for this > purpose. It may be that the one web address is actually multiple > servers, and if the NWS pushes out data revisions these could return > different results for T:X,Y,Z at T2 if the servers were not strictly > synchronized. Never mind the caching problems that revisions like this > would create on browsers. I have no idea if the NWS revises their data > files, but it would not be surprising if they did. > > After posting I thought of one other source of more or less random > verifiable numbers - the scores of sporting events. These are not > always generated every day, and are seasonal for the various sports. > They are however highly verifiable and when multiple events are grouped, > pretty much impossible to "fix" to preselected digits. For instance: > > http://www.nfl.com/scores > http://mlb.mlb.com/mlb/scoreboard > http://scores.espn.go.com/nba/scoreboard?date=20110304 > > These sites maintain historical records. Even if they didn't the scores > are widely published, and there are tens of thousands of witnesses to > the original event, so it would be pretty much impossible to > intentionally change a final score. There could still be copying/typo > errors from site to site though, but if such an error was discovered it > would be easy enough to resolve. There is no intrinsic order to the > scores, and some scheduled games might be canceled, so it would have to > be something like "sort the scores from all NBA teams who played on > 4/4/11 into ascending order and concatenate the digits". > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From nixon at nsc.liu.se Fri Aug 12 16:46:21 2011 From: nixon at nsc.liu.se (Leif Nixon) Date: Fri, 12 Aug 2011 22:46:21 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On 12 August 2011 17:58, David Mathog wrote: > After posting I thought of one other source of more or less random > verifiable numbers - the scores of sporting events. ?These are not > always generated every day, and are seasonal for the various sports. > They are however highly verifiable and when multiple events are grouped, > pretty much impossible to "fix" to preselected digits. Have you looked at RFC3797? Not sure if it has any solutions for you, but it at least discusses the same problems. -- Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert ------------------------------------------------------------ National Supercomputer Centre? ? -? ? ? Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Sat Aug 13 13:51:46 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 13 Aug 2011 13:51:46 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Fri, 12 Aug 2011, Leif Nixon wrote: > On 12 August 2011 17:58, David Mathog wrote: > >> After posting I thought of one other source of more or less random >> verifiable numbers - the scores of sporting events. ?These are not >> always generated every day, and are seasonal for the various sports. >> They are however highly verifiable and when multiple events are grouped, >> pretty much impossible to "fix" to preselected digits. > > Have you looked at RFC3797? Not sure if it has any solutions for you, but it > at least discusses the same problems. If people know how you are going to pick the seed of your rng, and know the rng, and know (or measure) the distribution function from which your seed is being drawn, they can easily transform the game into a non-zero sum game with advantage over all of those that don't do all of that. The only way to avoid this sort of thing is to pick your seed from a flat, unpredictable distribution. Unpredictable (in it's purest sense) includes flat, but the score distribution of almost any sporting event is, I'm pretty sure, not flat. That's why I really don't like the idea of running a lottery off of data like this. No state lottery could ever be certified on top of this sort of data. I'll tell you what. Piggy back your lottery to theirs. Powerball games occur every day all over the US. Pick your seed from the last 10 digits of one of those games. They are announced, publicly available on websites (I'm pretty sure), and if they aren't certifiably random, something is seriously wrong. In any event they are usually generated from an easily understandable random physical process that is almost certainly flat as well as unpredictable. Then pop it into your favorite AES-based or threefish based RNG, or cook up something yourself with even more rotors, spin it a while, and out comes your lottery winner -- basically a transmogrification of public state lottery number, but that's an ADVANTAGE, not a disadvantage... rgb > > > -- > Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert > ------------------------------------------------------------ > National Supercomputer Centre? ? -? ? ? Linkoping University > ------------------------------------------------------------ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Sat Aug 13 20:06:04 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 13 Aug 2011 20:06:04 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: >>> After posting I thought of one other source of more or less random >>> verifiable numbers - the scores of sporting events. ?These are not I immediately thought of another widely published stream of immutable noise: the congressional record. sorry, no smiley ;) > Then pop it into your favorite AES-based or threefish based RNG, or cook > up something yourself with even more rotors, spin it a while, and out > comes your lottery winner sorry, I don't understand your emphasis on flatness. why does the distribution of the seed (entropy source) matter, as long as it's reasonably large and not predictable before publication date? the crypto hash takes care of whitening, doesn't it? thanks, mark hahn. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Sat Aug 13 22:22:52 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 13 Aug 2011 22:22:52 -0400 (EDT) Subject: [Beowulf] Memory Testing? In-Reply-To: References: Message-ID: > I'm curious if anyone has any experience with ECC uncorrectable errors > (specifically not the identification of), but which specific dimm in > the chassis it's pointing to. we've had good luck using EDAC to pin down bad dimms - at least those that that cause _correctable_ errors. our uncorrectable errors trigger panics. I suppose that's selectable, though I guess you could turn that off (/sys/module/edac_mc/panic_on_ue) > The mcelog in linux doesn't seem to report the dimm slot correctly on > my supermicro boards. I prefer the hardware-topology-based naming that edac uses (controller, channel, chipselect). I guess recent versions of edac have a user-space tool that will translate that for you (but of course, you have to verify the topo-to-label mapping yourself anyway.) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Sun Aug 14 18:05:31 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 14 Aug 2011 18:05:31 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Sat, 13 Aug 2011, Mark Hahn wrote: >>>> After posting I thought of one other source of more or less random >>>> verifiable numbers - the scores of sporting events. ?These are not > > I immediately thought of another widely published stream of immutable noise: > the congressional record. sorry, no smiley ;) > >> Then pop it into your favorite AES-based or threefish based RNG, or cook >> up something yourself with even more rotors, spin it a while, and out >> comes your lottery winner > > sorry, I don't understand your emphasis on flatness. why does the > distribution of the seed (entropy source) matter, as long as it's reasonably > large and not predictable before publication date? > the crypto hash takes care of whitening, doesn't it? Bayes theorem. If one knows that (say) the distribution of digits in sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all the other digits -- because e.g. football games rarely get 4-9 in the second digit slot (note that this is an example only) one can gain a near 2-1 advantage over everybody else playing by picking seeds with the right frequencies and using only those seeds to select a set of numbers, if (as it sounds) there is an openly published unique map between the seed and the lottery outcome so "anybody can check that it is fair". In this latter case you aren't trying to guess the white "random" outcome, you are trying to guess the seed, and if the seed is drawn from a non-flat space you'll beat the pants off of anyone playing blind by using that space to generate your seeds/guesses. Basically you take the lottery from being a lottery with all numbers equally represented in the outcome space to being the moral equivalent of predicting the actual point outcome of N football or basketball games. The size of the latter space is MUCH smaller than the size of all possible scores, right? In fact, it is "small" compared to the latter space. So, sorry, I think that for a lottery (especially one with e.g. a cash payout and deep pocketed people capable of speculatively gambling to win based on expectation value based on an openly published hash and seeing method) needs to use a true random, true white seed, since you might just as well use the seed as the lottery number in this case and in no other case is it fair. Of course, if the lottery is for cakes at a bake sale, who cares. Just don't underestimate the cleverness of would-be attackers if the lottery has an openly published method of generating the result and/or potentially large payout. Plenty of people would tackle the project of cracking the lottery just for the thrill, even if the payout wasn't that great. If the payout was large enough, you'd have have deep-pocketed smart people covering the entire most-likely-point spread generated by Vegas bookies, week after week, through proxies, and making a bundle from it. rgb > > thanks, mark hahn. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Sun Aug 14 22:59:25 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Sun, 14 Aug 2011 19:59:25 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: , Message-ID: Given the discussion about lotteries, etc. This is the classic thing of "numbers games" as run by the mob. You pick a 3 digit number, and the winning number is determined by some readily available public source (stock market, sports games, racetrack winners, etc.). There's probably a fair amount of literature (aside from the works of M. Puzo) describing it. Payoff was something like 600:1 or 750:1, against a nominal 1000:1, so the numbers bank makes their money on the differential (the vig). Just looked up wikipedia.. " later led to the use of the last three numbers in the published daily balance of the United States Treasury." A moderately well known mathematician named Claude Shannon probably analyzed it.. He collaborated with E. Thorpe on some other interesting work on games. ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Robert G. Brown [rgb at phy.duke.edu] Sent: Sunday, August 14, 2011 15:05 To: Mark Hahn Cc: Beowulf Mailing List Subject: Re: [Beowulf] OT: public random numbers? On Sat, 13 Aug 2011, Mark Hahn wrote: >>>> After posting I thought of one other source of more or less random >>>> verifiable numbers - the scores of sporting events. ?These are not > > I immediately thought of another widely published stream of immutable noise: > the congressional record. sorry, no smiley ;) > >> Then pop it into your favorite AES-based or threefish based RNG, or cook >> up something yourself with even more rotors, spin it a while, and out >> comes your lottery winner > > sorry, I don't understand your emphasis on flatness. why does the > distribution of the seed (entropy source) matter, as long as it's reasonably > large and not predictable before publication date? > the crypto hash takes care of whitening, doesn't it? Bayes theorem. If one knows that (say) the distribution of digits in sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all the other digits -- because e.g. football games rarely get 4-9 in the second digit slot (note that this is an example only) one can gain a near 2-1 advantage over everybody else playing by picking seeds with the right frequencies and using only those seeds to select a set of numbers, if (as it sounds) there is an openly published unique map between the seed and the lottery outcome so "anybody can check that it is fair". In this latter case you aren't trying to guess the white "random" outcome, you are trying to guess the seed, and if the seed is drawn from a non-flat space you'll beat the pants off of anyone playing blind by using that space to generate your seeds/guesses. Basically you take the lottery from being a lottery with all numbers equally represented in the outcome space to being the moral equivalent of predicting the actual point outcome of N football or basketball games. The size of the latter space is MUCH smaller than the size of all possible scores, right? In fact, it is "small" compared to the latter space. So, sorry, I think that for a lottery (especially one with e.g. a cash payout and deep pocketed people capable of speculatively gambling to win based on expectation value based on an openly published hash and seeing method) needs to use a true random, true white seed, since you might just as well use the seed as the lottery number in this case and in no other case is it fair. Of course, if the lottery is for cakes at a bake sale, who cares. Just don't underestimate the cleverness of would-be attackers if the lottery has an openly published method of generating the result and/or potentially large payout. Plenty of people would tackle the project of cracking the lottery just for the thrill, even if the payout wasn't that great. If the payout was large enough, you'd have have deep-pocketed smart people covering the entire most-likely-point spread generated by Vegas bookies, week after week, through proxies, and making a bundle from it. rgb > > thanks, mark hahn. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Mon Aug 15 07:57:26 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 15 Aug 2011 07:57:26 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: , Message-ID: On Sun, 14 Aug 2011, Lux, Jim (337C) wrote: > A moderately well known mathematician named Claude Shannon probably > analyzed it.. He collaborated with E. Thorpe on some other interesting > work on games. Shannon? Shannon? The name almost rings a Bell. For your information, I think he's a few bits short of a byte, if you know what I mean. The guy practically Bayes at the moon. Sorry... feeling a bit, well, random this morning. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Mon Aug 15 13:08:59 2011 From: mathog at caltech.edu (David Mathog) Date: Mon, 15 Aug 2011 10:08:59 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Leif Nixon wrote: > Have you looked at RFC3797? Not sure if it has any solutions for you, but it > at least discusses the same problems. Good reference, I was not aware of that. It gives the same sorts of sources for random numbers as we have come up with here: stock market, sports, lottery. It discusses how stock market data may not be reliable due to market splits and other accounting issues. However, I have determined that the raw data from the exchanges is a terrible choice because it is not available for free, and the values that are freely available, which are posted on web finance sites, are not reliably identical in all digits. Lottery results are a good source except for the black box / black helicopter factors. We don't generally know where those numbers are coming from, and even in those cases where they do tell us, there is no way to verify that any particular lottery drawing wasn't rigged. We have not discussed election results (votes per candidate), but those are, ironically, really unsuitable for this, even though statistically the final set of digits should have a lot of entropy. Mostly election numbers are a problem because they may be revised for long periods after the election, and the numbers could almost always be forced to shift by a challenge by one of the candidates. Every recount will come up with a slightly different result. Examples: the Coleman vs. Franken senatorial contest in Minnesota, or Bush vs. Gore in Florida. So I'm leaning towards sports scores, as those are generated in full view of a multitude of witnesses (often numbering in the millions). It would be extremely difficult to rig the absolute final score. It might be possible to rig the winner, or even the point spread, but to rig the absolute score in a high scoring game like basketball, would be exceedingly difficult, and would likely be obvious to even the casual observer. To rig every digit in the final score of every game played on a given day should be pretty close to impossible. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cousins at umit.maine.edu Mon Aug 15 16:59:11 2011 From: cousins at umit.maine.edu (Steve Cousins) Date: Mon, 15 Aug 2011 16:59:11 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: Hi David, Can you give us more information about what you are doing? I'm getting curious about what problem you are working with that requires these conditions. Steve > We have not discussed election results (votes per candidate), but those > are, ironically, really unsuitable for this, even though statistically > the final set of digits should have a lot of entropy. Mostly election > numbers are a problem because they may be revised for long periods after > the election, and the numbers could almost always be forced to shift by > a challenge by one of the candidates. Every recount will come up with a > slightly different result. Examples: the Coleman vs. Franken senatorial > contest in Minnesota, or Bush vs. Gore in Florida. > > So I'm leaning towards sports scores, as those are generated in full > view of a multitude of witnesses (often numbering in the millions). It > would be extremely difficult to rig the absolute final score. It might > be possible to rig the winner, or even the point spread, but to rig the > absolute score in a high scoring game like basketball, would be > exceedingly difficult, and would likely be obvious to even the casual > observer. To rig every digit in the final score of every game played on > a given day should be pretty close to impossible. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Wed Aug 17 16:59:58 2011 From: lindahl at pbm.com (Greg Lindahl) Date: Wed, 17 Aug 2011 13:59:58 -0700 Subject: [Beowulf] Fwd: H8DMR-82 ECC error In-Reply-To: References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk> Message-ID: <20110817205958.GB7650@bx9.net> > Memtest was ok, I done 9 cycles without any problems. You should be using the HPL implementation of the Linpack benchmark for testing memory. It exercises all of the memory and all of the cores, and is what most HPC vendors seem to use for node burnin. There's even a bootable DVD with a kernel with enhanced EDAC that was mentioned here a while back. > Hardware Error > CPU0 Machine Check Exception 4 Bank 2 b200200000000863 > TSC 108dd369444 > Processor 2:40f13 Time 1311847912 Socket 0 APIC 0 > MC2-Status: Uncorredted error, report: yes MisV: invalid > CPU context corrupt: yes UECC Error > Bud Unit Error: prefetch/ECC error in data read from NB: local node originated > (SRC) > Transaction type: prefetch (mem access), no timeout, cache level L3/generic. > Participating Processors: local node originated (SRC) And I take it that the location information given here (socket 0, bank 2) isn't useful? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From david.t.kewley at gmail.com Sat Aug 20 16:16:28 2011 From: david.t.kewley at gmail.com (David Kewley) Date: Sat, 20 Aug 2011 13:16:28 -0700 Subject: [Beowulf] Memory Testing? In-Reply-To: References: Message-ID: A few bits from my corner of the experience space: If you have a BMC, 'ipmitool sel list' will probably show the correctable and uncorrectable errors, generally not naming the DIMM involved. But 'ipmitool sel list -v' shows details from various fields in the SEL records. In the ASUS boards I've been playing with lately, the Sensor Number field together with the Event Data field will (usually) tell you the DIMM slot, once you know how to decode those fields for the specific motherboard (and possibly firmware revisions?) that you have. How do you get that motherboard-specific data? By finding a DIMM that reliably produces errors, and moving it from slot to slot, taking notes on those two SEL fields above. I've seen a similar thing work for Dell machines too. If you have Dell PowerEdge R or M boxes (or previous generation equivalents), there are various nicer ways to get the name of the DIMM involved, including using a version of ipmitool that has the 'delloem' subcommand. I second Tony's suggestion that RAM testers may not be as good as real systems, for finding bad RAM. My experience on one large system a few years ago was that new DIMMs failed at a rate of around 1% per year, but "refurbished" DIMMs from RMAs failed at 10% per year (or was it even higher? I forget). I was led to believe that these refurbished DIMMs were often customer returns that had been run through a RAM tester and passed. Turns out sometimes the customers were right and the "refurbishment" process was wrong. One more thing about the ASUS boards I've been playing with lately: If you get a panic on uncorrectable memory error, and power cycle the system (using the power button, or by remote 'ipmitool ... power cycle'), the following POST does not report the bad DIMM. But if you *reset* the system (by pushing the reset button with a paperclip, or by remote 'ipmitool ... power reset'), the next POST will pause and tell you what CPU, Channel, and DIMM was affected on that previous uncorrectable error, which is more info that 'ipmitool sel list' gives you. It's then up to you to figure out how CPU, Channel, and DIMM map to the silkscreened names on the motherboard -- I couldn't find documentation, but it turned out to be the pattern we suspected. :) David -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at mclaren.com Tue Aug 23 11:46:59 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Tue, 23 Aug 2011 16:46:59 +0100 Subject: [Beowulf] Flash storage arrays Message-ID: <207BB2F60743C34496BE41039233A809071F88D8@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Does anyone have an opinion of these for CFD workloads: http://www.theregister.co.uk/2011/08/23/pure_storage_fa_300/ the interesting thing is they claim is is cheaper than disk - but that's a hard claim to assess in an HPC context as it SEEMS to be only when their inbuild deduplication is taken into account. I'm not sure how much dedupe buys you with typical HPC data - ie large files rather than lots of nearly-identical emails or visrtual disk images. John Hearns | CFD Hardware Specialist | McLaren Racing Limited McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK T: +44 (0) 1483 261000 D: +44 (0) 1483 262352 F: +44 (0) 1483 261010 E: john.hearns at mclaren.com W: www.mclaren.com The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Aug 24 21:30:39 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 25 Aug 2011 03:30:39 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: <8BBC31C3-7F1A-433C-863A-5F0EBB4714AC@xs4all.nl> In a world where you don't trust others, using MD5 is out of the question. It's not safe. It's possible to fake a MD5 sum by modifying the number to whatever you wish (if it is enough random data) and then add something, with just a small correction to the data to again get the md5sum that was posted on the website. Vincent On Aug 12, 2011, at 5:21 PM, David Mathog wrote: > Robert G. Brown wrote: > >> Everybody must be able to obtain it >>> freely from a web connection. >> >> http://www.random.org/ >> > > Nice site. They have something that is very close, the pregenerated > random files, from which a small set of digits may be extracted, > and the > files themselves have MD5 checksums (but are not signed). > They also support https. It comes up a little short on criteria 1 (we > really don't know what is going on behind the scenes) and 6 (it is a > single site.) > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Aug 24 21:58:52 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 25 Aug 2011 03:58:52 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> On Aug 12, 2011, at 8:35 PM, Robert G. Brown wrote: > On Fri, 12 Aug 2011, David Mathog wrote: > >> Robert G. Brown wrote: >> >>> Everybody must be able to obtain it >>>> freely from a web connection. >>> >>> http://www.random.org/ >>> >> >> Nice site. They have something that is very close, the pregenerated >> random files, from which a small set of digits may be extracted, >> and the >> files themselves have MD5 checksums (but are not signed). >> They also support https. It comes up a little short on criteria 1 >> (we >> really don't know what is going on behind the scenes) and 6 (it is a >> single site.) > > Behind the scenes is documented pretty well on the site, and the > guy who > runs it is a human being, you can communicate with him to learn even > more. I already know him a bit, as he and I have collaborated on > applying dieharder to test random.org datasets -- even "the" > random.org > dataset as of some time ago (I have a few hundred MB of random number > from the site in my dieharder directory). IIRC, the numbers are > generated continuously and fairly slowly by grabbing and filtering and > transforming atmospheric noise. As a source of entropy, that is > probably excellent if (as noted) slow, but many good sources of > entropy > seem to be fairly slow. He has good reason to think that his numbers > are theoretically "true random numbers" Well there is another test i stumbled upon when i did do some analysis on casino (which student who takes himself serious doesn't do an attempt to write some simulations seeing whether you can win something in the casino by designing some strategies?). The simulation revealed it was rather easy to make a fortune with roulette with the doubling system (first put in 1 then if you win, put in 1 else double and keep doublinguntil you win). Reports from guys (some of them missing an eye, another one a hand) who actually study anything trying to make a profit in casino's (and they also really try it in the casino's), revealed that using the doubling system they never saw someone really make big profit with it. So there was a problem between the random generated data versus the true random numbers generated in the casino. Statistical analysis revealed the problem, though not so soon. I noticed that most generated semi-random numbers with software generators, had the habit to truely adress a search space of n always in O (n log n). So if you draw from most software RNG's a number and do it modulo n, with n being not too tiny, say quite some millions or even billions, then every slot in your 'hashtable' will get hit at least once by the RNG, whereas data in reality simply happens to not have that habit simply. So true random numbers versus generated noise is in this manner easy to distinguish by this. Now i didn't study literature whether some other chap some long time ago already had invented this. That would be most interesting to know. In semi pseudo code, let's take an array of size a billion as an example, though usually a few million is more than ok: n = 2^30; // 2 to the power 30 Function TestNumbersForRandomness(RNG,n) { declare array hashtable[size n]; guessednlogn = 2 * (log n / log 2) * n; for( i = 0 ; i < n ; i++ ) hashtable[i] = FALSE; ndraws = filledn = 0; while( ndraws < guessednlogn ) { randomnumber = RNG(); r = randomnumber % n; // randomnumber = r (mod n) if( hashtable[r] == FALSE ) { hashtable[r] = TRUE; filledn++; if( filledn >= n ) break; } ndraws++; } if( filledn >= n ) print "With high degree of certainty data generated by a RNG\n"); else print "Not so sure it's a RNG\n"; } Regards, Vincent > -- both unpredictable and > flat/decorrelated at all orders, and even though there aren't really > enough of them for my purposes, I've used them as one of the (small) > "gold standard" sources for testing dieharder even as I test them. > For > all practical purposes threefish or aes are truly random as well and > they are a lot faster and easier to use as gold standard generators, > though. > > I don't quite understand why the single site restriction is > important -- > this site has been up for years and I don't expect it to go away soon; > it is quite reliable. I don't think there is anything secret about > how > the numbers are generated, and I'll certify that the numbers it > produces > don't make dieharder unhappy. So 1 is fixable with a bit of effort on > your part; 6 I don't really understand but the guy who runs the > site is > clearly willing to construct a custom feed for cash customers, if > there > is enough value in whatever it is you are trying to do to pay for > access. If it's just a lottery, well, lord, I can think of a dozen > ways > to make numbers so random that they'd be unimpeachable for any sort of > lottery, both unpredictable and uncorrelated, and they don't any of > them > require any significant amount of entropy to get started. > > I will add one warning -- "randomness" is a rather stringent > mathematical criterion, and is generally tested against the null > hypothesis. Amateurs who want to make random number generators out of > supposedly "random" data streams or fancy algorithms almost invariably > fail, sometimes spectacularly so. There are a half dozen or more > really, really good pseudorandom number generators out there and it is > easy to hotwire them together into an xor-based high entropy stream > that > basically never repeats (feeding it a bit of real entropy now and then > as it operates). I would strongly counsel you against trying to take > e.g. weather data and make something "random" out of it. Unless you > really know what you are doing, you will probably make something that > isn't at all random and may not even be unpredictable. Even most > sources of "quantum" randomness (which is at least possibly "truly > random", although I doubt it) aren't flat, so that they carry the > signature of their generation process unless/until you manage to > transform them into something flat (difficult unless you KNOW the > distribution they are producing). Pseudorandom number generators have > the serious advantage of being amenable to at least some theoretical > analysis (so you can "guarantee" flatness out to some high > dimensionality, say) as well as empirical testing with e.g. dieharder. > > HTH, > > rgb > >> >> Thanks, >> >> David Mathog >> mathog at caltech.edu >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Thu Aug 25 08:11:07 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 25 Aug 2011 08:11:07 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> Message-ID: On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > I noticed that most generated semi-random numbers with software generators, > had the habit to truely adress a search space of n always in O (n log n). > > So if you draw from most software RNG's a number and do it modulo n, > with n being not too tiny, say quite some millions or even billions, then > every > slot in your 'hashtable' will get hit at least once by the RNG, whereas data > in reality simply happens to not have that habit simply. > > So true random numbers versus generated noise is in this manner easy > to distinguish by this. Now i didn't study literature whether some other chap > some long time ago already had invented this. That would be most interesting > to know. Some other chap named George Marsaglia (and to some extent another chap named Donald Knuth) have already invented this. A number of tests of the tails of random number generators are already in dieharder. All "good" modern rngs pass these tests. The Martingale betting system you are looking at is even older (at least Marsaglia and Knuth are still alive). It dates back to the 18th century, and is well known to be flawed for a variety of reasons, not the least of which is that gamblers don't have the infinite wealth necessary to make this >>even<< a zero-sum strategy and casinos have betting limits that de facto make it impossible to pursue the requisite number of steps and in roulette in particular have 0 and/or 00 slots and aren't zero-sum to begin with. You can read a decent analysis of outcomes based on the presumed binomial distribution of a zero-sum game here: http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 Your test below is interesting, though. The only real problems I can see with actually using it in dieharder are: a) One would need a theoretical estimate of the distribution of filling given n log n draws on an n-slotted table (for largish n). That is, for a perfect rng, what SHOULD the distribution of success/failure be. b) One would then need the CDF for this distribution, to be able to turn the results of N trials (of n log n pulls each) into a p-value under the null hypothesis -- the probability of obtaining the particular number of successes and failures presuming a perfectly random generator. That way dieharder could apply it rigorously to its 70 or 80 embedded rngs or to any user's outboard generator. There probably is theoretical statistical support for the PD and/or CDF -- you're analyzing the tails of a poissonian process -- but finding it or doing it yourself (or myself), aye, that's the rub. One cannot just say "high degree of certainty that it is an RNG" (by which one means that the rng in question fails the test for randomness) in the test. HOW high? Perfect rngs or perfectly random processes will sometimes fill your table, but how often? How can you differentiate an "accident" when one does from an actual failure? All of those questions require a more rigorous theory and quantitative result embedded in a test that can be systematically cranked up to more clearly resolve failures until they are unambiguous, not marginal maybe yes maybe no. I suspect that the failures this test would reveal are already more than covered in dieharder, in particular by the bit distribution tests and the monkey tests, but I'm not terribly happy with the monkey tests and would be perfectly thrilled to have a simpler to compute test that revealed precisely this sort of flaw, systematically. And it doesn't hurt at all to have partially or fully redundant tests as long as the test themselves are rigorously valid. If you can find or compute the CDF for your test below, I'd be happy to wrap it up and add it to dieharder, in other words. One can always SIMULATE a CDF, of course, but that requires a known good generator and sort of begs the question if you don't think that e.g. AES or threefish or KISS are good generators that would actually pass your test. Even hardware/quantum sources of random bits are suspect -- they often are generated by a process that leaves in the traces of an underlying distribution. I'm not convinced that >>any<< process in the real world is >>truly<< random. Physics is ambiguous on the issue -- the quantum description of a closed system is just as deterministic as the classical one, and Master equation unpredictability on open subsets of a large closed system reflects entropy/ignorance, not actual randomness (hence Einstein's famous "doesn't play dice" remark). But lots of this are sufficiently random that one cannot detect any failure of randomness, modern crypto class generators being a prime example. rgb > > In semi pseudo code, let's take an array of size a billion as an example, > though usually a few million is more than ok: > > n = 2^30; // 2 to the power 30 > > Function TestNumbersForRandomness(RNG,n) { > declare array hashtable[size n]; > > guessednlogn = 2 * (log n / log 2) * n; > > for( i = 0 ; i < n ; i++ ) > hashtable[i] = FALSE; > > ndraws = filledn = 0; > while( ndraws < guessednlogn ) { > randomnumber = RNG(); > r = randomnumber % n; // randomnumber = r (mod n) > if( hashtable[r] == FALSE ) { > hashtable[r] = TRUE; > filledn++; > if( filledn >= n ) > break; > > } > ndraws++; > } > > if( filledn >= n ) > print "With high degree of certainty data generated by a RNG\n"); > else > print "Not so sure it's a RNG\n"; > > } > > > > > > Regards, > Vincent > > > > >> -- both unpredictable and >> flat/decorrelated at all orders, and even though there aren't really >> enough of them for my purposes, I've used them as one of the (small) >> "gold standard" sources for testing dieharder even as I test them. For >> all practical purposes threefish or aes are truly random as well and >> they are a lot faster and easier to use as gold standard generators, >> though. >> >> I don't quite understand why the single site restriction is important -- >> this site has been up for years and I don't expect it to go away soon; >> it is quite reliable. I don't think there is anything secret about how >> the numbers are generated, and I'll certify that the numbers it produces >> don't make dieharder unhappy. So 1 is fixable with a bit of effort on >> your part; 6 I don't really understand but the guy who runs the site is >> clearly willing to construct a custom feed for cash customers, if there >> is enough value in whatever it is you are trying to do to pay for >> access. If it's just a lottery, well, lord, I can think of a dozen ways >> to make numbers so random that they'd be unimpeachable for any sort of >> lottery, both unpredictable and uncorrelated, and they don't any of them >> require any significant amount of entropy to get started. >> >> I will add one warning -- "randomness" is a rather stringent >> mathematical criterion, and is generally tested against the null >> hypothesis. Amateurs who want to make random number generators out of >> supposedly "random" data streams or fancy algorithms almost invariably >> fail, sometimes spectacularly so. There are a half dozen or more >> really, really good pseudorandom number generators out there and it is >> easy to hotwire them together into an xor-based high entropy stream that >> basically never repeats (feeding it a bit of real entropy now and then >> as it operates). I would strongly counsel you against trying to take >> e.g. weather data and make something "random" out of it. Unless you >> really know what you are doing, you will probably make something that >> isn't at all random and may not even be unpredictable. Even most >> sources of "quantum" randomness (which is at least possibly "truly >> random", although I doubt it) aren't flat, so that they carry the >> signature of their generation process unless/until you manage to >> transform them into something flat (difficult unless you KNOW the >> distribution they are producing). Pseudorandom number generators have >> the serious advantage of being amenable to at least some theoretical >> analysis (so you can "guarantee" flatness out to some high >> dimensionality, say) as well as empirical testing with e.g. dieharder. >> >> HTH, >> >> rgb >> >>> >>> Thanks, >>> >>> David Mathog >>> mathog at caltech.edu >>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> >> >> Robert G. Brown http://www.phy.duke.edu/~rgb/ >> Duke University Dept. of Physics, Box 90305 >> Durham, N.C. 27708-0305 >> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Aug 25 21:55:04 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 03:55:04 +0200 Subject: [Beowulf] OT: Calculating Extraterrestrial Life - was public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> Message-ID: On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote: > On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > >> I noticed that most generated semi-random numbers with software >> generators, >> had the habit to truely adress a search space of n always in O (n >> log n). >> >> So if you draw from most software RNG's a number and do it modulo n, >> with n being not too tiny, say quite some millions or even >> billions, then every >> slot in your 'hashtable' will get hit at least once by the RNG, >> whereas data >> in reality simply happens to not have that habit simply. >> >> So true random numbers versus generated noise is in this manner easy >> to distinguish by this. Now i didn't study literature whether some >> other chap >> some long time ago already had invented this. That would be most >> interesting >> to know. > > Some other chap named George Marsaglia (and to some extent another > chap > named Donald Knuth) have already invented this. A number of tests of > the tails of random number generators are already in dieharder. All > "good" modern rngs pass these tests. > > The Martingale betting system you are looking at is even older (at > least > Marsaglia and Knuth are still alive). It dates back to the 18th > century, and is well known to be flawed for a variety of reasons, not > the least of which is that gamblers don't have the infinite wealth > necessary to make this >>even<< a zero-sum strategy and casinos have > betting limits that de facto make it impossible to pursue the > requisite > number of steps and in roulette in particular have 0 and/or 00 > slots and > aren't zero-sum to begin with. You can read a decent analysis of > outcomes based on the presumed binomial distribution of a zero-sum > game > here: > > http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 > > Your test below is interesting, though. The only real problems I can > see with actually using it in dieharder are: > > a) One would need a theoretical estimate of the distribution of > filling given n log n draws on an n-slotted table (for largish n). > That > is, for a perfect rng, what SHOULD the distribution of success/failure > be. > > b) One would then need the CDF for this distribution, to be able to > turn the results of N trials (of n log n pulls each) into a p-value > under the null hypothesis -- the probability of obtaining the > particular > number of successes and failures presuming a perfectly random > generator. > > That way dieharder could apply it rigorously to its 70 or 80 embedded > rngs or to any user's outboard generator. There probably is > theoretical > statistical support for the PD and/or CDF -- you're analyzing the > tails > of a poissonian process -- but finding it or doing it yourself (or > myself), aye, that's the rub. One cannot just say "high degree of > certainty that it is an RNG" (by which one means that the rng in > question fails the test for randomness) in the test. HOW high? > Perfect > rngs or perfectly random processes will sometimes fill your table, but > how often? How can you differentiate an "accident" when one does from > an actual failure? All of those questions require a more rigorous > theory and quantitative result embedded in a test that can be > systematically cranked up to more clearly resolve failures until they > are unambiguous, not marginal maybe yes maybe no. > > I suspect that the failures this test would reveal are already more > than > covered in dieharder, in particular by the bit distribution tests and Thanks for your kind words - you'll realize that, seeing all the theories you quote below where i simply have few knowledge from and definitely not the time for to investigate (yet), you're talking way above my level of knowledge there. Instead of going deep into mathematical theories i would find it more appropriate to ponder on the feasability to calculate the existance of extraterrestrial life. Now i realize a lot of efforts go into recognizing messages from outer space. Yet we can also speculate on some things. First of all i'd like to make a statement on extra terrestrial life and a viewpoint there. One viewpoint i've seen promoted is that some scientist(s) claim we should hide ourselves for extraterrestrial life. I fully disagree there to some extend. If there is extraterrestrial life that is more advanced than our society, obviously they also could have build weapons to totally selfdestruct and would already have killed themselves if they would have been agressive forms of life. If they were succesful in reproducing themselves, just like humankind is right now, they would have burned up all resources at their own planet, caused massive extinctions. It is difficult to defend the statement that all mass extinctions were caused by meteorites only - for such statement one would need a proof for every single mass extinction being caused by a specific meteorite; the extinction could just have been that a certain succesful species dominated the planet a tad too much and didn't get clever enough to selfcontrol nor selfregulate to an extend that the planet didn't entirely die. After some millions of years life restores itself then on the planet. So if there would be such an intelligent life elsewhere more advanced than our society is, they sure would want to communicate in a manner that information could get read through different galaxies. However for the most primitive life forms that are succesful to dominate a planet one would want to hide this information for until such society reaches a specific level. One would only want intelligent life to decypher such extraterrestrial form of communication by another extra terrestrial form of life, where the form of life is of a sustainable peaceful level. I would argue such a lifeform would not form a threat to anyone, as they already have proven to not be a threat to their own planet. So if the knowledge of this society is high enough to be able to control all that, one would also be able to argue that belonging to that high level of development, would belong a specific level of math. A level strong enough to decypher the form of communication that gets used to communicate between the different very intelligent lifeforms in existance through the galaxies. From the fact that there is not a systematic form of contact with extraterrestrial life we can already deduce that humankind still has to develop itself further from a species that burns up all its resources, especially causing too much output of CO2 (the latest report i'll have to check out is that the increased CO2 level increases the amount of CO2 absorbed by the oceans causing it to get more sour, causing plankton, start of the foodchain, to not develop its skeleton enough, which for sure in the long run will cause mass extinction). Now we might not be advanced enough yet to decypher extraterrestrial communication, so i wonder whether we might be able to recognize somehow that there is information getting communicated using a form of encryption that we simply cannot decypher yet, based upon comparing it versus how our RNG's work. Some of them run for example over a primefield, others have a distribution too perfect. If we get from space radiation measurements back, and we test them for belonging in a specific class or type of randomness versus non randomness; how does that compare with if we have a source of radiation ourselves that's comparable to that and its randomness classification? Obviously the algorithm i gave is just one specific form of algorithm to measure a perfect distribution - as you already indicated there are many other tests invented already. In how far have those been applied to what could be encrypted communication from extraterrestrial life to other extraterrestrial life (like us if we manage to survive as species and develop further to a peaceful level that can sustain itself for a longer period of time). So summarized what i wonder about is in how random number theory can contribute to detecting extraterrestrial life (of course with a specific statistical significance to it). This of course in combination with experiments conducted that allow us to first classify how a specific form of possible communication system would behave normally spoken according to the randomness classification system, versus the classification on how the measured possible form of communication compares to that. Such classification system would need to be very sophisticated to have any chance of detecing extraterrestrial life i'd guess, as we can't just naively assume that all they could come up with is encrypting things over a primefield using smallish primes which in our world already only is allowed to be used upto secret level. Regards, Vincent > the monkey tests, but I'm not terribly happy with the monkey tests and > would be perfectly thrilled to have a simpler to compute test that > revealed precisely this sort of flaw, systematically. And it doesn't > hurt at all to have partially or fully redundant tests as long as the > test themselves are rigorously valid. If you can find or compute the > CDF for your test below, I'd be happy to wrap it up and add it to > dieharder, in other words. One can always SIMULATE a CDF, of course, > but that requires a known good generator and sort of begs the question > if you don't think that e.g. AES or threefish or KISS are good > generators that would actually pass your test. > > Even hardware/quantum sources of random bits are suspect -- they often > are generated by a process that leaves in the traces of an underlying > distribution. I'm not convinced that >>any<< process in the real > world > is >>truly<< random. Physics is ambiguous on the issue -- the quantum > description of a closed system is just as deterministic as the > classical > one, and Master equation unpredictability on open subsets of a large > closed system reflects entropy/ignorance, not actual randomness (hence > Einstein's famous "doesn't play dice" remark). But lots of this are > sufficiently random that one cannot detect any failure of randomness, > modern crypto class generators being a prime example. > > rgb > >> >> In semi pseudo code, let's take an array of size a billion as an >> example, >> though usually a few million is more than ok: >> >> n = 2^30; // 2 to the power 30 >> >> Function TestNumbersForRandomness(RNG,n) { >> declare array hashtable[size n]; >> >> guessednlogn = 2 * (log n / log 2) * n; >> >> for( i = 0 ; i < n ; i++ ) >> hashtable[i] = FALSE; >> >> ndraws = filledn = 0; >> while( ndraws < guessednlogn ) { >> randomnumber = RNG(); >> r = randomnumber % n; // randomnumber = r (mod n) >> if( hashtable[r] == FALSE ) { >> hashtable[r] = TRUE; >> filledn++; >> if( filledn >= n ) >> break; >> >> } >> ndraws++; >> } >> >> if( filledn >= n ) >> print "With high degree of certainty data generated by a RNG\n"); >> else >> print "Not so sure it's a RNG\n"; >> >> } >> >> >> >> >> >> Regards, >> Vincent >> >> >> >> >>> -- both unpredictable and >>> flat/decorrelated at all orders, and even though there aren't really >>> enough of them for my purposes, I've used them as one of the (small) >>> "gold standard" sources for testing dieharder even as I test >>> them. For >>> all practical purposes threefish or aes are truly random as well and >>> they are a lot faster and easier to use as gold standard generators, >>> though. >>> I don't quite understand why the single site restriction is >>> important -- >>> this site has been up for years and I don't expect it to go away >>> soon; >>> it is quite reliable. I don't think there is anything secret >>> about how >>> the numbers are generated, and I'll certify that the numbers it >>> produces >>> don't make dieharder unhappy. So 1 is fixable with a bit of >>> effort on >>> your part; 6 I don't really understand but the guy who runs the >>> site is >>> clearly willing to construct a custom feed for cash customers, if >>> there >>> is enough value in whatever it is you are trying to do to pay for >>> access. If it's just a lottery, well, lord, I can think of a >>> dozen ways >>> to make numbers so random that they'd be unimpeachable for any >>> sort of >>> lottery, both unpredictable and uncorrelated, and they don't any >>> of them >>> require any significant amount of entropy to get started. >>> I will add one warning -- "randomness" is a rather stringent >>> mathematical criterion, and is generally tested against the null >>> hypothesis. Amateurs who want to make random number generators >>> out of >>> supposedly "random" data streams or fancy algorithms almost >>> invariably >>> fail, sometimes spectacularly so. There are a half dozen or more >>> really, really good pseudorandom number generators out there and >>> it is >>> easy to hotwire them together into an xor-based high entropy >>> stream that >>> basically never repeats (feeding it a bit of real entropy now and >>> then >>> as it operates). I would strongly counsel you against trying to >>> take >>> e.g. weather data and make something "random" out of it. Unless you >>> really know what you are doing, you will probably make something >>> that >>> isn't at all random and may not even be unpredictable. Even most >>> sources of "quantum" randomness (which is at least possibly "truly >>> random", although I doubt it) aren't flat, so that they carry the >>> signature of their generation process unless/until you manage to >>> transform them into something flat (difficult unless you KNOW the >>> distribution they are producing). Pseudorandom number generators >>> have >>> the serious advantage of being amenable to at least some theoretical >>> analysis (so you can "guarantee" flatness out to some high >>> dimensionality, say) as well as empirical testing with e.g. >>> dieharder. >>> HTH, >>> >>> rgb >>>> Thanks, >>>> David Mathog >>>> mathog at caltech.edu >>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Aug 25 20:27:18 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 02:27:18 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> Message-ID: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote: > On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > >> I noticed that most generated semi-random numbers with software >> generators, >> had the habit to truely adress a search space of n always in O (n >> log n). >> >> So if you draw from most software RNG's a number and do it modulo n, >> with n being not too tiny, say quite some millions or even >> billions, then every >> slot in your 'hashtable' will get hit at least once by the RNG, >> whereas data >> in reality simply happens to not have that habit simply. >> >> So true random numbers versus generated noise is in this manner easy >> to distinguish by this. Now i didn't study literature whether some >> other chap >> some long time ago already had invented this. That would be most >> interesting >> to know. > > Some other chap named George Marsaglia (and to some extent another > chap > named Donald Knuth) have already invented this. A number of tests of > the tails of random number generators are already in dieharder. All > "good" modern rngs pass these tests. > > The Martingale betting system you are looking at is even older (at > least > Marsaglia and Knuth are still alive). It dates back to the 18th > century, and is well known to be flawed for a variety of reasons, not > the least of which is that gamblers don't have the infinite wealth > necessary to make this >>even<< a zero-sum strategy and casinos have From mathematical viewpoint it makes perfect cash. As statistica odds is you already have build up considerable profit when a worst case (that you hit the 10 times practical double limit) hits you. The simulations are of course using the practical limit. Note that the European casino's have a single zero. In USA there is even more greedy mafia controlling all the casino's, there are 2 zero's there. 0 and 00. The simulations were for European casino's. > betting limits that de facto make it impossible to pursue the > requisite > number of steps and in roulette in particular have 0 and/or 00 > slots and > aren't zero-sum to begin with. You can read a decent analysis of > outcomes based on the presumed binomial distribution of a zero-sum > game > here: > > http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 > You're not allowed to use a system in a casino, so we speak about theory. Probably first evening they let you try. Second day you'll get on the blacklist. > Your test below is interesting, though. The only real problems I can > see with actually using it in dieharder are: > Yeah more interesting than the billion times discussed roulette system which has been analyzed completely flat. > a) One would need a theoretical estimate of the distribution of > filling given n log n draws on an n-slotted table (for largish n). > That > is, for a perfect rng, what SHOULD the distribution of success/failure > be. As we figured out by now in Artificial Intelligence the statistical assumptions made in the past they simply do not hold. For Artificial Intelligence we need a new sort of theoretical theory. As for the distribution problem, generatiors having a spread that's too accurate, the way to deliver a proof would be for example build a simple device. Build an old fashioned box where you can draw balls. Remember what you coud see on TV some 20 years ago or so (not sure it was like that in USA). A big basked with balls. The basket, in fact it's looking like this: http://www.rateyours.com/blog/uploaded_images/lottery_machine-727064.jpg But now a much bigger machine like this with inside different means of randomizing the balls, actually also randomly modifying the inside obstacles of shaking of the balls. After a ball has been drawn you automatically have it annotated and the ball immediately goes back into the machine. For a full minute you have the balls in the machine shaken again and you draw again a ball. It is important to do this randomizing of the balls inside the machine for quite some time. I would propose a minute. Of course you have to do this with quite some balls. Say a thousand. Then you draw balls until all numbers have been drawn at least once. This cool experiment can be easily build. Of course the expected running time of a single experiment will be a few weeks. You can produce a number of those drawing machines though and have a look. Theories that seemingly work for small n, n being the number of balls, are much harder to maintain at bigger n's, as we also see in prime number research. The way how the machine gets designed of course is total crucial. I would propose a design that really shakes the balls really a lot through each other and really very thoroughly. Just like we nowadays know how flawed a big number of card shaking machines are that are popular to use. Such a lottery with realy a lot of balls would be very interesting to see the outcomes from. In fact i would prefer having produced number of those machines, so that it's possible to really have a lot of outcomes and then analyze them very well. > > b) One would then need the CDF for this distribution, to be able to > turn the results of N trials (of n log n pulls each) into a p-value > under the null hypothesis -- the probability of obtaining the > particular > number of successes and failures presuming a perfectly random > generator. > > That way dieharder could apply it rigorously to its 70 or 80 embedded > rngs or to any user's outboard generator. There probably is > theoretical > statistical support for the PD and/or CDF -- you're analyzing the > tails > of a poissonian process -- but finding it or doing it yourself (or > myself), aye, that's the rub. One cannot just say "high degree of > certainty that it is an RNG" (by which one means that the rng in > question fails the test for randomness) in the test. HOW high? > Perfect > rngs or perfectly random processes will sometimes fill your table, but > how often? If we assume that reality of life represents randomness, which is another rather good question in how far that theory is plausible, then using that assumption i'm very sure that the RNG's i investigated so far have a distribution which is too perfect, more perfect than i have seen in any reality. In fact most RNG's fill all slots faster than O ( n log n ), yet it's O ( n log n ) that they follow. This is RNG's that have come through all tests as being a good and very acceptabe RNG to be used. Realize i'm no RNG expert, so all the names of all those tests. For me it's just push button technology. I just designed a test and found it very odd that all RNG's have such perfect distributions that they don't even miss a single slot. I'd argue the only test that would be interesting to me to see how it might be in reality is the lottery machine test - yet with really a lot of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical reasons i would agree with a number of above a 1000. Paper fiddling is really not interesting to me there to prove anything, as what i've seen in reality in randomness is total different from how RNG's model that. Regards, Vincent > How can you differentiate an "accident" when one does from > an actual failure? All of those questions require a more rigorous > theory and quantitative result embedded in a test that can be > systematically cranked up to more clearly resolve failures until they > are unambiguous, not marginal maybe yes maybe no. > > I suspect that the failures this test would reveal are already more > than > covered in dieharder, in particular by the bit distribution tests and > the monkey tests, but I'm not terribly happy with the monkey tests and > would be perfectly thrilled to have a simpler to compute test that > revealed precisely this sort of flaw, systematically. And it doesn't > hurt at all to have partially or fully redundant tests as long as the > test themselves are rigorously valid. If you can find or compute the > CDF for your test below, I'd be happy to wrap it up and add it to > dieharder, in other words. One can always SIMULATE a CDF, of course, > but that requires a known good generator and sort of begs the question > if you don't think that e.g. AES or threefish or KISS are good > generators that would actually pass your test. > > Even hardware/quantum sources of random bits are suspect -- they often > are generated by a process that leaves in the traces of an underlying > distribution. I'm not convinced that >>any<< process in the real > world > is >>truly<< random. Physics is ambiguous on the issue -- the quantum > description of a closed system is just as deterministic as the > classical > one, and Master equation unpredictability on open subsets of a large > closed system reflects entropy/ignorance, not actual randomness (hence > Einstein's famous "doesn't play dice" remark). But lots of this are > sufficiently random that one cannot detect any failure of randomness, > modern crypto class generators being a prime example. > > rgb > >> >> In semi pseudo code, let's take an array of size a billion as an >> example, >> though usually a few million is more than ok: >> >> n = 2^30; // 2 to the power 30 >> >> Function TestNumbersForRandomness(RNG,n) { >> declare array hashtable[size n]; >> >> guessednlogn = 2 * (log n / log 2) * n; >> >> for( i = 0 ; i < n ; i++ ) >> hashtable[i] = FALSE; >> >> ndraws = filledn = 0; >> while( ndraws < guessednlogn ) { >> randomnumber = RNG(); >> r = randomnumber % n; // randomnumber = r (mod n) >> if( hashtable[r] == FALSE ) { >> hashtable[r] = TRUE; >> filledn++; >> if( filledn >= n ) >> break; >> >> } >> ndraws++; >> } >> >> if( filledn >= n ) >> print "With high degree of certainty data generated by a RNG\n"); >> else >> print "Not so sure it's a RNG\n"; >> >> } >> >> >> >> >> >> Regards, >> Vincent >> >> >> >> >>> -- both unpredictable and >>> flat/decorrelated at all orders, and even though there aren't really >>> enough of them for my purposes, I've used them as one of the (small) >>> "gold standard" sources for testing dieharder even as I test >>> them. For >>> all practical purposes threefish or aes are truly random as well and >>> they are a lot faster and easier to use as gold standard generators, >>> though. >>> I don't quite understand why the single site restriction is >>> important -- >>> this site has been up for years and I don't expect it to go away >>> soon; >>> it is quite reliable. I don't think there is anything secret >>> about how >>> the numbers are generated, and I'll certify that the numbers it >>> produces >>> don't make dieharder unhappy. So 1 is fixable with a bit of >>> effort on >>> your part; 6 I don't really understand but the guy who runs the >>> site is >>> clearly willing to construct a custom feed for cash customers, if >>> there >>> is enough value in whatever it is you are trying to do to pay for >>> access. If it's just a lottery, well, lord, I can think of a >>> dozen ways >>> to make numbers so random that they'd be unimpeachable for any >>> sort of >>> lottery, both unpredictable and uncorrelated, and they don't any >>> of them >>> require any significant amount of entropy to get started. >>> I will add one warning -- "randomness" is a rather stringent >>> mathematical criterion, and is generally tested against the null >>> hypothesis. Amateurs who want to make random number generators >>> out of >>> supposedly "random" data streams or fancy algorithms almost >>> invariably >>> fail, sometimes spectacularly so. There are a half dozen or more >>> really, really good pseudorandom number generators out there and >>> it is >>> easy to hotwire them together into an xor-based high entropy >>> stream that >>> basically never repeats (feeding it a bit of real entropy now and >>> then >>> as it operates). I would strongly counsel you against trying to >>> take >>> e.g. weather data and make something "random" out of it. Unless you >>> really know what you are doing, you will probably make something >>> that >>> isn't at all random and may not even be unpredictable. Even most >>> sources of "quantum" randomness (which is at least possibly "truly >>> random", although I doubt it) aren't flat, so that they carry the >>> signature of their generation process unless/until you manage to >>> transform them into something flat (difficult unless you KNOW the >>> distribution they are producing). Pseudorandom number generators >>> have >>> the serious advantage of being amenable to at least some theoretical >>> analysis (so you can "guarantee" flatness out to some high >>> dimensionality, say) as well as empirical testing with e.g. >>> dieharder. >>> HTH, >>> >>> rgb >>>> Thanks, >>>> David Mathog >>>> mathog at caltech.edu >>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 02:07:17 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 02:07:17 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > If we assume that reality of life represents randomness, which is another > rather good question in how far that theory is plausible, then using that > assumption i'm very sure that the RNG's i investigated so far > have a distribution which is too perfect, more perfect than i have seen > in any reality. That's because you live in a different reality than everybody else, Vincent. > In fact most RNG's fill all slots faster than O ( n log n ), yet it's O ( n > log n ) > that they follow. In fact, they don't. > This is RNG's that have come through all tests as being a good and > very acceptabe RNG to be used. No, it's not. > Realize i'm no RNG expert, so all the names of all those tests. > > For me it's just push button technology. I just designed a test > and found it very odd that all RNG's have such perfect distributions > that they don't even miss a single slot. It's odd because your test is broken. > > I'd argue the only test that would be interesting to me to see how it > might be in reality is the lottery machine test - yet with really a lot > of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical > reasons i would agree with a number of above a 1000. > > Paper fiddling is really not interesting to me there to prove anything, > as what i've seen in reality in randomness is total different from how > RNG's model that. Let's try a bit of "paper fiddling". The expected number of filled slots is (this is actual code, not pseudocode, for n slots): nlogn = log10(n)*n; expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); The reasoning is enormously simple. The probability of a slot being empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - 1/n)^nlogn. The probability of a slot being filled is thus 1 - p_e, and given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within random noise, n*(1-1/n)^nlogn of them "should" be empty. Well, I've got a random number generator tester harness, so I hacked your test into it. One major bug in your code, BTW, is using a modulus to generate your random numbers -- dunno what that's about, but if your rng returned numbers between (say) 0 and 7 and you use it to generate numbers in the range 0 to 5 by means of r%5 then you'll get (for the sequence of numbers) 0 1 2 3 4 0 1 2. Note well that you get twice as many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7. So you aren't even testing a uniformly distributed sequence of integers. Fixing this relatively minor bug, removing your breakout and actually counting up filledn for the full nlogn samples, and applying the test to mt19937, we get: rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head n = 10000000 nlogn = 70000000 table not all filled: filledn = 9990811, expected = 9990881 We run it again: rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head n = 10000000 nlogn = 70000000 table not all filled: filledn = 9990802, expected = 9990881 We run it for R250 -- a well-known not-good generator: rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head n = 10000000 nlogn = 70000000 table not all filled: filledn = 9990794, expected = 9990881 We run it on the literally infamous randu: n = 10000000 nlogn = 70000000 table not all filled: filledn = 9999482, expected = 9990881 Note, Vincent, that the last two examples of correctly computed results from known-terrible generators are much farther from the expected mean than mt19937, a well-known damn good one. This suggests that your test (perhaps unsurprisingly) has some sensitivity, not because some slots are or aren't empty, but because the NUMBER of slots that are or aren't empty isn't quite correct. Note also that in the "paper fiddling" analysis above, the use of nlogn is quite unimportant -- we could make this an independent variable and evaluate the table filling for any value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m. If I have the energy, I'll see if the distribution of filledn around expected is e.g. Gaussian -- it seems pretty reasonable that it would be -- with some expected or empirically computable variance. If it is, then this can be fairly easily turned into an actual test that returns a p-value that humans can use to make rational judgements, or rational humans can use to make judgements or something like that. I doubt that the test will have MUCH sensitivity -- modern generators are way too good to have their flaws picked out quite this simply, although Marsaglia's "monkey tests" do something very similar although a lot more sophisticated mathematically (and arguably more sensitive) and do suffice to nail randu (anything nails randu) and semi-weak tests like R250. Now, let's see what we've learned from this fiddling. One is that without it, you just waste a lot of people's time making egregious and false claims that belittle the tremendously sophisticated and difficult work a whole lot of "fiddlers" have put into inventing, writing, and testing modern RNGs. The truth is that >>all<< RNGs in dieharder "pass" your test (if the test is "producing at least one zero") once your test isn't broken. We've learned that in fact, the best of the modern RNGs are damn good, and that you could work for five years trying to invent a test that is good enough to fail any of them and still not succeed. Finally, we've learned that you should not, not, not take your Martingale to a casino and try the doubling strategy out to make money, or if you do put a firm upper bound -- something like 63 Euro -- on what you're willing to lose with your base stake of 1 Euro. That way you have maybe a 40% chance of doubling your 63 Euro before you go broke. Really, you should read the Wikipedia article I linked, in spite of the fact that it presents more "paper fiddling". Sincerely, rgb (See P.S. comments below...) >>> n = 2^30; // 2 to the power 30 >>> >>> Function TestNumbersForRandomness(RNG,n) { >>> declare array hashtable[size n]; >>> >>> guessednlogn = 2 * (log n / log 2) * n; Why guess nlogn? nlog is n*log10(n). Why nlogn anyway? Call it m and make it a parameter. >>> for( i = 0 ; i < n ; i++ ) >>> hashtable[i] = FALSE; >>> >>> ndraws = filledn = 0; >>> while( ndraws < guessednlogn ) { >>> randomnumber = RNG(); >>> r = randomnumber % n; // randomnumber = r (mod n) no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX. Yes there are roundoff errors, but they are uniform and consistent and as you can see, don't affect this problem. What you have isn't even close to uniform -- it is badly nonrandom. >>> if( hashtable[r] == FALSE ) { >>> hashtable[r] = TRUE; >>> filledn++; >>> if( filledn >= n ) >>> break; Don't break. Just count up filledn. It will never be more than n now anyway, for any n, or any reasonable m. There probably is some number of pulls that will raise "expected" to n, but it is pretty big compared to n, way bigger than nlogn. >>> >>> } >>> ndraws++; >>> } >>> >>> if( filledn >= n ) >>> print "With high degree of certainty data generated by a RNG\n"); >>> else >>> print "Not so sure it's a RNG\n"; >>> >>> } I'm guessing the correct statistic here is something like |expected - filledn|/expected, but as I said, I haven't really worked at it. I haven't decided whether or not it is worth adding this to dieharder -- without a formal derivation of the expected statistic it would be yet another empirical test, which means you're really comparing one RNG to another presumed better one, which I don't like. And do I have time to do the "fiddling" needed to do a proper derivation? Aye, that's the rub...;-) rgb >>> >>> >>> >>> >>> >>> Regards, >>> Vincent >>> >>> >>> >>> >>>> -- both unpredictable and >>>> flat/decorrelated at all orders, and even though there aren't really >>>> enough of them for my purposes, I've used them as one of the (small) >>>> "gold standard" sources for testing dieharder even as I test them. For >>>> all practical purposes threefish or aes are truly random as well and >>>> they are a lot faster and easier to use as gold standard generators, >>>> though. >>>> I don't quite understand why the single site restriction is important -- >>>> this site has been up for years and I don't expect it to go away soon; >>>> it is quite reliable. I don't think there is anything secret about how >>>> the numbers are generated, and I'll certify that the numbers it produces >>>> don't make dieharder unhappy. So 1 is fixable with a bit of effort on >>>> your part; 6 I don't really understand but the guy who runs the site is >>>> clearly willing to construct a custom feed for cash customers, if there >>>> is enough value in whatever it is you are trying to do to pay for >>>> access. If it's just a lottery, well, lord, I can think of a dozen ways >>>> to make numbers so random that they'd be unimpeachable for any sort of >>>> lottery, both unpredictable and uncorrelated, and they don't any of them >>>> require any significant amount of entropy to get started. >>>> I will add one warning -- "randomness" is a rather stringent >>>> mathematical criterion, and is generally tested against the null >>>> hypothesis. Amateurs who want to make random number generators out of >>>> supposedly "random" data streams or fancy algorithms almost invariably >>>> fail, sometimes spectacularly so. There are a half dozen or more >>>> really, really good pseudorandom number generators out there and it is >>>> easy to hotwire them together into an xor-based high entropy stream that >>>> basically never repeats (feeding it a bit of real entropy now and then >>>> as it operates). I would strongly counsel you against trying to take >>>> e.g. weather data and make something "random" out of it. Unless you >>>> really know what you are doing, you will probably make something that >>>> isn't at all random and may not even be unpredictable. Even most >>>> sources of "quantum" randomness (which is at least possibly "truly >>>> random", although I doubt it) aren't flat, so that they carry the >>>> signature of their generation process unless/until you manage to >>>> transform them into something flat (difficult unless you KNOW the >>>> distribution they are producing). Pseudorandom number generators have >>>> the serious advantage of being amenable to at least some theoretical >>>> analysis (so you can "guarantee" flatness out to some high >>>> dimensionality, say) as well as empirical testing with e.g. dieharder. >>>> HTH, >>>> >>>> rgb >>>>> Thanks, >>>>> David Mathog >>>>> mathog at caltech.edu >>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>>> Duke University Dept. of Physics, Box 90305 >>>> Durham, N.C. 27708-0305 >>>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> Robert G. Brown http://www.phy.duke.edu/~rgb/ >> Duke University Dept. of Physics, Box 90305 >> Durham, N.C. 27708-0305 >> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >> > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Aug 26 07:56:15 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 13:56:15 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Aug 26, 2011, at 8:07 AM, Robert G. Brown wrote: > On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > >> If we assume that reality of life represents randomness, which is >> another >> rather good question in how far that theory is plausible, then >> using that >> assumption i'm very sure that the RNG's i investigated so far >> have a distribution which is too perfect, more perfect than i have >> seen >> in any reality. > > That's because you live in a different reality than everybody else, > Vincent. Or reality we live in might not be so random as we all guess... But it's good that you took a look at the die-harder test now - which you didn't do before. > >> In fact most RNG's fill all slots faster than O ( n log n ), yet >> it's O ( n log n ) >> that they follow. > > In fact, they don't. > >> This is RNG's that have come through all tests as being a good and >> very acceptabe RNG to be used. > > No, it's not. > >> Realize i'm no RNG expert, so all the names of all those tests. >> >> For me it's just push button technology. I just designed a test >> and found it very odd that all RNG's have such perfect distributions >> that they don't even miss a single slot. > > It's odd because your test is broken. > >> >> I'd argue the only test that would be interesting to me to see how it >> might be in reality is the lottery machine test - yet with really >> a lot >> of balls. I'd prefer 10k balls over a 1000 in fact - yet for >> practical >> reasons i would agree with a number of above a 1000. >> >> Paper fiddling is really not interesting to me there to prove >> anything, >> as what i've seen in reality in randomness is total different from >> how >> RNG's model that. > > Let's try a bit of "paper fiddling". The expected number of filled > slots > is (this is actual code, not pseudocode, for n slots): > > nlogn = log10(n)*n; > expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); > > The reasoning is enormously simple. The probability of a slot being > empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - > 1/n)^nlogn. The probability of a slot being filled is thus 1 - > p_e, and > given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within > random noise, n*(1-1/n)^nlogn of them "should" be empty. > > Well, I've got a random number generator tester harness, so I hacked > your test into it. One major bug in your code, BTW, is using a modulus > to generate your random numbers -- dunno what that's about, but if > your EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM. Apologies for the caps. I hope how important this is. You're claiming all programmers use random numbers in a faulty manner? This is important enough to further discuss about it. As nearly always you need random numbers from within a given domain say 0.. n-1 So projecting a RNG onto that domain is pretty crucial. How would you want to do that in a correct manner? In the slot test in fact a simple AND is enough. > rng returned numbers between (say) 0 and 7 and you use it to generate > numbers in the range 0 to 5 by means of r%5 then you'll get (for the > sequence of numbers) 0 1 2 3 4 0 1 2. Note well that you get twice as > many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7. > So you aren't even testing a uniformly distributed sequence of > integers. > > Fixing this relatively minor bug, removing your breakout and actually > counting up filledn for the full nlogn samples, and applying the > test to > mt19937, we get: > > rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9990811, expected = 9990881 > > We run it again: > rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9990802, expected = 9990881 > > We run it for R250 -- a well-known not-good generator: > rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9990794, expected = 9990881 > > We run it on the literally infamous randu: > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9999482, expected = 9990881 > > Note, Vincent, that the last two examples of correctly computed > results > from known-terrible generators are much farther from the expected mean > than mt19937, a well-known damn good one. This suggests that your > test > (perhaps unsurprisingly) has some sensitivity, not because some slots > are or aren't empty, but because the NUMBER of slots that are or > aren't > empty isn't quite correct. Note also that in the "paper fiddling" > analysis above, the use of nlogn is quite unimportant -- we could make > this an independent variable and evaluate the table filling for any > value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m. > > If I have the energy, I'll see if the distribution of filledn around > expected is e.g. Gaussian -- it seems pretty reasonable that it > would be > -- with some expected or empirically computable variance. If it is, > then this can be fairly easily turned into an actual test that > returns a > p-value that humans can use to make rational judgements, or rational > humans can use to make judgements or something like that. I doubt > that > the test will have MUCH sensitivity -- modern generators are way too > good to have their flaws picked out quite this simply, although > Marsaglia's "monkey tests" do something very similar although a lot > more > sophisticated mathematically (and arguably more sensitive) and do > suffice to nail randu (anything nails randu) and semi-weak tests like > R250. > > Now, let's see what we've learned from this fiddling. One is that > without it, you just waste a lot of people's time making egregious and > false claims that belittle the tremendously sophisticated and > difficult > work a whole lot of "fiddlers" have put into inventing, writing, and > testing modern RNGs. The truth is that >>all<< RNGs in dieharder > "pass" > your test (if the test is "producing at least one zero") once your > test > isn't broken. We've learned that in fact, the best of the modern RNGs > are damn good, and that you could work for five years trying to > invent a > test that is good enough to fail any of them and still not succeed. > Finally, we've learned that you should not, not, not take your > Martingale to a casino and try the doubling strategy out to make > money, It's not interesting to discuss - but yes this strategy makes money in casino's, you just get thrown out of the casino and end up at the blacklist if you do. For good chessplayers all this is not so tough. The casino's blacklist of people too strong in blackjack is endless... ...this is practice for long than we live now... So Casino reality is much simpler. They kick you out if you're good. That's why they try to popularize poker now - you don't play against the casino there. > or if you do put a firm upper bound -- something like 63 Euro -- on > what > you're willing to lose with your base stake of 1 Euro. That way you > have maybe a 40% chance of doubling your 63 Euro before you go broke. > Really, you should read the Wikipedia article I linked, in spite of > the > fact that it presents more "paper fiddling". > > Sincerely, > > rgb > > (See P.S. comments below...) > >>>> n = 2^30; // 2 to the power 30 >>>> Function TestNumbersForRandomness(RNG,n) { >>>> declare array hashtable[size n]; >>>> guessednlogn = 2 * (log n / log 2) * n; > > Why guess nlogn? nlog is n*log10(n). Why nlogn anyway? Call it m > and > make it a parameter. > >>>> for( i = 0 ; i < n ; i++ ) >>>> hashtable[i] = FALSE; >>>> ndraws = filledn = 0; >>>> while( ndraws < guessednlogn ) { >>>> randomnumber = RNG(); >>>> r = randomnumber % n; // randomnumber = r (mod n) > > no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX. > Yes > there are roundoff errors, but they are uniform and consistent and as > you can see, don't affect this problem. What you have isn't even > close > to uniform -- it is badly nonrandom. > >>>> if( hashtable[r] == FALSE ) { >>>> hashtable[r] = TRUE; >>>> filledn++; > >>>> if( filledn >= n ) >>>> break; > > Don't break. Just count up filledn. It will never be more than n now > anyway, for any n, or any reasonable m. There probably is some > number of > pulls that will raise "expected" to n, but it is pretty big > compared to > n, way bigger than nlogn. > >>>> >>>> } >>>> ndraws++; >>>> } >>>> if( filledn >= n ) >>>> print "With high degree of certainty data generated by a RNG\n"); >>>> else >>>> print "Not so sure it's a RNG\n"; >>>> } > > I'm guessing the correct statistic here is something like |expected - > filledn|/expected, but as I said, I haven't really worked at it. I > haven't decided whether or not it is worth adding this to dieharder -- > without a formal derivation of the expected statistic it would be yet > another empirical test, which means you're really comparing one RNG to > another presumed better one, which I don't like. And do I have > time to > do the "fiddling" needed to do a proper derivation? Aye, that's the > rub...;-) > > rgb > >>>> Regards, >>>> Vincent >>>>> -- both unpredictable and >>>>> flat/decorrelated at all orders, and even though there aren't >>>>> really >>>>> enough of them for my purposes, I've used them as one of the >>>>> (small) >>>>> "gold standard" sources for testing dieharder even as I test >>>>> them. For >>>>> all practical purposes threefish or aes are truly random as >>>>> well and >>>>> they are a lot faster and easier to use as gold standard >>>>> generators, >>>>> though. >>>>> I don't quite understand why the single site restriction is >>>>> important -- >>>>> this site has been up for years and I don't expect it to go >>>>> away soon; >>>>> it is quite reliable. I don't think there is anything secret >>>>> about how >>>>> the numbers are generated, and I'll certify that the numbers it >>>>> produces >>>>> don't make dieharder unhappy. So 1 is fixable with a bit of >>>>> effort on >>>>> your part; 6 I don't really understand but the guy who runs the >>>>> site is >>>>> clearly willing to construct a custom feed for cash customers, >>>>> if there >>>>> is enough value in whatever it is you are trying to do to pay for >>>>> access. If it's just a lottery, well, lord, I can think of a >>>>> dozen ways >>>>> to make numbers so random that they'd be unimpeachable for any >>>>> sort of >>>>> lottery, both unpredictable and uncorrelated, and they don't >>>>> any of them >>>>> require any significant amount of entropy to get started. >>>>> I will add one warning -- "randomness" is a rather stringent >>>>> mathematical criterion, and is generally tested against the null >>>>> hypothesis. Amateurs who want to make random number generators >>>>> out of >>>>> supposedly "random" data streams or fancy algorithms almost >>>>> invariably >>>>> fail, sometimes spectacularly so. There are a half dozen or more >>>>> really, really good pseudorandom number generators out there >>>>> and it is >>>>> easy to hotwire them together into an xor-based high entropy >>>>> stream that >>>>> basically never repeats (feeding it a bit of real entropy now >>>>> and then >>>>> as it operates). I would strongly counsel you against trying >>>>> to take >>>>> e.g. weather data and make something "random" out of it. >>>>> Unless you >>>>> really know what you are doing, you will probably make >>>>> something that >>>>> isn't at all random and may not even be unpredictable. Even most >>>>> sources of "quantum" randomness (which is at least possibly "truly >>>>> random", although I doubt it) aren't flat, so that they carry the >>>>> signature of their generation process unless/until you manage to >>>>> transform them into something flat (difficult unless you KNOW the >>>>> distribution they are producing). Pseudorandom number >>>>> generators have >>>>> the serious advantage of being amenable to at least some >>>>> theoretical >>>>> analysis (so you can "guarantee" flatness out to some high >>>>> dimensionality, say) as well as empirical testing with e.g. >>>>> dieharder. >>>>> HTH, >>>>> >>>>> rgb >>>>>> Thanks, >>>>>> David Mathog >>>>>> mathog at caltech.edu >>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>>>> Robert G. Brown http://www.phy.duke.edu/ >>>>> ~rgb/ >>>>> Duke University Dept. of Physics, Box 90305 >>>>> Durham, N.C. 27708-0305 >>>>> Phone: 1-919-660-2567 Fax: 919-660-2525 >>>>> email:rgb at phy.duke.edu >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>>>> Computing >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >> > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 08:29:06 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 08:29:06 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Robert G. Brown wrote: > Let's try a bit of "paper fiddling". The expected number of filled slots > is (this is actual code, not pseudocode, for n slots): > > nlogn = log10(n)*n; > expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); > > The reasoning is enormously simple. The probability of a slot being > empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - > 1/n)^nlogn. The probability of a slot being filled is thus 1 - p_e, and > given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within > random noise, n*(1-1/n)^nlogn of them "should" be empty. Silly me. All of the anonymous slots are at least asymptotically independent (not necessarily obvious, but true from symmetry, I think, subject to the weak constraint that the total population of all of the slots has to add up to the number of trials so there are probably n-1 degrees of freedom in the Pearson test). We have p and q. The distribution is binomial and of course I know the binomial distribution and its sigma. I can easily build any one of several tests on top of this (simple binomial or even multinomial, since I effectively have the hit frequency for n slots and it should BE the binomial distribution), and in fact have two or three already that are very similar to this on a smaller scale. It's what comes of hacking out -- sorry, "fiddling" out -- quick solutions and tests late at night when you're tired and ought to be sleeping. A bit of coffee makes a world of difference...:-) I'll have to think a bit about it and make sure that this isn't already done, better, in e.g. STS, but it might yet see the light of day as an actual dieharder test. BTW, I'm not replying to your space alien ET post (to the Beowulf list in reply to an already OT discussion of martingales that arose out of a discussion of good RNGs and seeding strategies sorry y'all but hey, at least it is entertaining?) simply because my jaw is sore from hitting the ground so many times while reading it. Those are some top-quality hallucinogens, yes they are... We will now return to your regularly scheduled discussion of boring things like bandwidth, memory reliability, parallel algorithms and the like, you know, on-topic stuff. But if any of y'all ever need to test rngs or flame schemes to "win" non-zero-sum games by means of "strategy", you know who to call...;-) Somewhere upstairs I have this nifty book on game theory and in a pinch I can even trot out an actual game matrix and analyze outcomes algefiddlingbraically! rgb P.S. -- Vincent, all of these simple problems were solved by mathematicians and statisticians so very, very, long ago, beginning with the work of Pascal and Fermat (there are names to conjure with, eh?) solving the problem posed by the Chevalier de Mere regarding an even bet on double sixes happening at least once in 24 throws: actual probability of double sixes per throw are (of course) 1/36, probability of no double six in 24 throws are (35/36)^24, odds of at least one are therefore 1 - (35/36)^24 = 0.4914038761 -- all paper fiddling, mind you -- a result that is eerily reminiscent of the solution to your problem, but with fewer slots. So at even odds it is -- barely -- a sucker's bet. But a margin of 0.86% is enough to empty even the deepest pockets, over time. Now all you have to do is advance your actual knowledge of statistics beyond that realized by an idly rich French nobleman in 1654 (who still was wise enough to recognize that it wasn't an even bet and consulted the best of the best of the minds of his day to prove it). You have a mere 357 years to go...:-) P.P.S -- If "all rngs" were really as bad as you assert, does it not stand to reason that "all Monte Carlo computations" that use them would all get egregiously incorrect results? And yet they don't. In fact, in problems (like the Ising model in 2D) where known solutions exist, they agree basically perfectly with the theoretical solution, and of course it is easy to compare a wide range of integrals and Markov process outcomes with theory. So if you used your simple common sense you would construct a mental argument like: "Either I, in my brilliance, have discovered an egregious flaw in all random number generators used by all of those STUPID computer scientists, mathematicians, and physicists for decades to do their long and complex computations that no doubt all got equally egregiously wrong answers; Or Those computer scientists, mathematicians, and physicists are actually pretty smart and aggressively check their work (and each other's work) with a strong incentive to discover problems. It is rather probable that any such egregious error would have been long ago discovered; therefore there is almost certainly a serious error in my own reasoning." Seriously, dude. Ask yourself "Am I really smarter and better informed than Pascal, Fermat, Laplace, Bayes, not to mention all of those contemporary humans who have been devoting entire well-educated careers to random numbers as if all of modern e-commerce depended on them (it does) or is it just barely possible that I've made a mistake?" Come on, you can do it. I know it is difficult for you, but try.. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 08:57:55 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 08:57:55 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM. Bullshit. "Every programmer" isn't dumb as a post. Or wasn't my argument clear enough? Do you need me to actually post the code for how the GSL -- written by at least some of these programmers -- do this? Here, I'll try again. This time I'll use smaller numbers and make an actual table of the outcomes: Imagine only two lousy random bits, enough to make 00, 01, 10, 11 (or 0,1,2,3). Here is the probability table: r = 0 1 2 3 ------------------------ p = 0.25 0.25 0.25 0.25 Let us generate N samples from this distribution. Our expected frequency of the occurrence of all of these numbers is: r = 0 1 2 3 --------------------------------- Np = 0.25*N 0.25*N 0.25*N 0.25*N Is this all clear? If I generate 100 random numbers, the expected number of 3's is 0.25*100 = 25. Now apply mod 3 the outcomes are now: r = 0 1 2 3 r%3 = 0 1 2 0 --------------------------------- Np = 0.25*N 0.25*N 0.25*N 0.25*N You now sum the number of outcomes for each element in the mod 3 table, since we have two values of r that make one value of r%3 and frequency clearly aggregates as the outcomes are independent. r%3 = 0 1 2 --------------------------------- Np = 0.50*N 0.25*N 0.25*N 0.25*N It is therefore twice as likely that two random bits, modulus 3, will produce a zero. > > Apologies for the caps. I hope how important this is. You're claiming all > programmers > use random numbers in a faulty manner? They don't. Only you do. Everybody else takes a uniform deviate and scales it by the number of desired integer outcomes, checking to make sure that they don't go out of bounds and thereby e.g. get an incorrect endpoint frequency. The gsl code is open source and it takes two minutes to download it and check (I just timed it). Go on, look. the file is rng/rng.c in the gsl distro directory, the function name is gsl_rng_uniform_int. No modulus. The exception is (obviously) when the range is a power of 2. In that case ONLY, r%n where r is a binary uint and n is a power of 2 will (obviously) equally balance the table above. Personally I'd use >> and shift the bits because it is faster than mod, but suit yourself, after you've learned what you are doing. > > This is important enough to further discuss about it. > > As nearly always you need random numbers from within a given domain say 0.. > n-1 > So projecting a RNG onto that domain is pretty crucial. How would you want to > do that in a correct manner? > > In the slot test in fact a simple AND is enough. No, as I've just proven algebraically. The correct manner for general n is the gsl code, but in rough terms it is n*r/r_max (with care used to avoid roundoff errors at the ends as noted). If you've been using modulus, all your results are crap. Look, the reason God invented the GSL and made it open source is so numb-nuts and smart people alike wouldn't have to constantly reinvent the wheel, badly. Use it. Don't question it -- you obviously aren't competent to. Just use it. If you want a random integer from 0 to n, use gsl_rn_uniform_int. If you want this for e.g. mt19937 don't write the latter, set up the gsl to use it to generate your ints. Learn to use it carefully, use it correctly, but use it. > It's not interesting to discuss - but yes this strategy makes money in > casino's, > you just get thrown out of the casino and end up at the blacklist if you do. You are clearly too stupid to be allowed out of the house without a caretaker. I'm not going to walk you through the proof that this isn't so as it is openly published and I've already referenced a step my step analysis that you can't be bothered, apparently, to actually read. I'll just reiterate the previous offer -- I, too, am happy to buy a roulette wheel and you can come over and bet Martingale against me all day. Just one 0, no limits and no quitting, infinite credit on both sides, we play until it is obvious to you that you are losing, have lost, will always lose, and the longer you play the more that you will lose. Loser buys the winner a case of truly excellent beer. Look, why don't you fix your random number code and try again, since your simulations are obviously trash. It isn't difficult to show this with simulations, once you actually code them correctly, but I have to go and don't have time to do it for you. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Aug 26 12:53:14 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 18:53:14 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl> On Aug 26, 2011, at 2:57 PM, Robert G. Brown wrote: > On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > >> EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR >> PROGRAM. > > Bullshit. "Every programmer" isn't dumb as a post. Or wasn't my > argument clear enough? Do you need me to actually post the code > for how > the GSL -- written by at least some of these programmers -- do this? > Here, I'll try again. This time I'll use smaller numbers and make an > actual table of the outcomes: Imagine only two lousy random bits, > enough to make 00, 01, 10, 11 (or 0,1,2,3). Here is the probability > table: > > r = 0 1 2 3 > ------------------------ > p = 0.25 0.25 0.25 0.25 > > Let us generate N samples from this distribution. Our expected > frequency of the occurrence of all of these numbers is: > > r = 0 1 2 3 > --------------------------------- > Np = 0.25*N 0.25*N 0.25*N 0.25*N > > Is this all clear? If I generate 100 random numbers, the expected > number of 3's is 0.25*100 = 25. Now apply mod 3 the outcomes are > now: > > r = 0 1 2 3 > r%3 = 0 1 2 0 > --------------------------------- > Np = 0.25*N 0.25*N 0.25*N 0.25*N > > You now sum the number of outcomes for each element in the mod 3 > table, > since we have two values of r that make one value of r%3 and frequency > clearly aggregates as the outcomes are independent. > > r%3 = 0 1 2 > --------------------------------- > Np = 0.50*N 0.25*N 0.25*N 0.25*N > > It is therefore twice as likely that two random bits, modulus 3, will > produce a zero. > If you have a domain of 0..3 where a generator generates and your modulo n is just n-1, obviously that means it'll map a tad more to 0. Basically the deviation one would be able to measure in such case is that if we have a generator that runs over a field of say size m and we want to map that onto n entries then we have the next formula : m = x * n + y; Now your theory is basically if i summarize it that in such case the entries 0..y-1 will have a tad higher hit than y.. m-1. However if x is large enough that shouldn't be a big problem. If we map now in the test i'm doing onto say a few million to a billion entries, the size of that x is a number of 40+ bits for most RNG's. So that means that the deviation of the effect you show above the order of magnitued of 1 / 2^40 in such case, which is rather small. Especially because the 'test' if you want to call it like that, is operating in the granularity O ( log n ), we can fully ignore then the expected deviation granularity O ( 2 ^ 40 ). >> >> Apologies for the caps. I hope how important this is. You're >> claiming all programmers >> use random numbers in a faulty manner? > > They don't. Only you do. Everybody else takes a uniform deviate and > scales it by the number of desired integer outcomes, checking to make > sure that they don't go out of bounds and thereby e.g. get an > incorrect > endpoint frequency. The gsl code is open source and it takes two > minutes to download it and check (I just timed it). Go on, look. the > file is rng/rng.c in the gsl distro directory, the function name is > gsl_rng_uniform_int. No modulus. > > The exception is (obviously) when the range is a power of 2. In that > case ONLY, r%n where r is a binary uint and n is a power of 2 will > (obviously) equally balance the table above. Personally I'd use >> > and > shift the bits because it is faster than mod, but suit yourself, after > you've learned what you are doing. > >> >> This is important enough to further discuss about it. >> >> As nearly always you need random numbers from within a given >> domain say 0.. n-1 >> So projecting a RNG onto that domain is pretty crucial. How would >> you want to do that in a correct manner? >> >> In the slot test in fact a simple AND is enough. > > No, as I've just proven algebraically. The correct manner for > general n > is the gsl code, but in rough terms it is n*r/r_max (with care used to > avoid roundoff errors at the ends as noted). If you've been using > modulus, all your results are crap. > > Look, the reason God invented the GSL and made it open source is so > numb-nuts and smart people alike wouldn't have to constantly reinvent > the wheel, badly. Use it. Don't question it -- you obviously aren't > competent to. Just use it. If you want a random integer from 0 to n, > use gsl_rn_uniform_int. If you want this for e.g. mt19937 don't write > the latter, set up the gsl to use it to generate your ints. Learn to > use it carefully, use it correctly, but use it. > >> It's not interesting to discuss - but yes this strategy makes >> money in casino's, >> you just get thrown out of the casino and end up at the blacklist >> if you do. > > You are clearly too stupid to be allowed out of the house without a > caretaker. I'm not going to walk you through the proof that this > isn't > so as it is openly published and I've already referenced a step my > step > analysis that you can't be bothered, apparently, to actually read. > I'll > just reiterate the previous offer -- I, too, am happy to buy a > roulette > wheel and you can come over and bet Martingale against me all day. > Just > one 0, no limits and no quitting, infinite credit on both sides, we > play > until it is obvious to you that you are losing, have lost, will always > lose, and the longer you play the more that you will lose. Loser buys > the winner a case of truly excellent beer. > > Look, why don't you fix your random number code and try again, since > your simulations are obviously trash. It isn't difficult to show this > with simulations, once you actually code them correctly, but I have to > go and don't have time to do it for you. > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Aug 26 14:17:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 20:17:46 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: <40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> <40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com> Message-ID: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl> On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote: > I hate to troll, but... > > On Aug 25, 2011, at 8:27 PM, Vincent Diepeveen wrote: > >> >> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote: >> >>> On Thu, 25 Aug 2011, Vincent Diepeveen wrote: >>> >>>> I noticed that most generated semi-random numbers with software >>>> generators, >>>> had the habit to truely adress a search space of n always in O (n >>>> log n). >>>> >>>> So if you draw from most software RNG's a number and do it >>>> modulo n, >>>> with n being not too tiny, say quite some millions or even >>>> billions, then every >>>> slot in your 'hashtable' will get hit at least once by the RNG, >>>> whereas data >>>> in reality simply happens to not have that habit simply. >>>> >>>> So true random numbers versus generated noise is in this manner >>>> easy >>>> to distinguish by this. Now i didn't study literature whether some >>>> other chap >>>> some long time ago already had invented this. That would be most >>>> interesting >>>> to know. >>> >>> Some other chap named George Marsaglia (and to some extent another >>> chap >>> named Donald Knuth) have already invented this. A number of >>> tests of >>> the tails of random number generators are already in dieharder. All >>> "good" modern rngs pass these tests. >>> >>> The Martingale betting system you are looking at is even older (at >>> least >>> Marsaglia and Knuth are still alive). It dates back to the 18th >>> century, and is well known to be flawed for a variety of reasons, >>> not >>> the least of which is that gamblers don't have the infinite wealth >>> necessary to make this >>even<< a zero-sum strategy and casinos have >> >> From mathematical viewpoint it makes perfect cash. >> As statistica odds is you already have build up considerable profit >> when a worst case (that you hit the 10 times practical double limit) >> hits you. > > A betting system will not improve the negative mathematical > expectation of a casino game. the doubling system doesn't have a negative expectation. You are allowed to double 10 times practical if you start with 1. Of all systems in roulette this is the only system that will produce a profit, just theoretical spoken, practice we all agree. they kick you out. > If your mathematical expectation is -1 for each trial, it's -10 > for ten trials. You will not win in the long-run using Martingale. > Except that this system doesn't have a negative expectation. it has a positive expectation. There is no other system in roulette that has a positive expectation, other than the doubling system. Please use European Casino model. I don't live in the USA. >> >> The simulations are of course using the practical limit. >> >> Note that the European casino's have a single zero. >> In USA there is even more greedy mafia controlling all the casino's, >> there are 2 zero's there. 0 and 00. >> >> The simulations were for European casino's. >> >>> betting limits that de facto make it impossible to pursue the >>> requisite >>> number of steps and in roulette in particular have 0 and/or 00 >>> slots and >>> aren't zero-sum to begin with. You can read a decent analysis of >>> outcomes based on the presumed binomial distribution of a zero-sum >>> game >>> here: >>> >>> http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 >>> >> >> You're not allowed to use a system in a casino, so we speak about >> theory. Probably first evening they let you try. Second day you'll >> get on the blacklist. > > Nonsense. Have you ever been to a casino? > You are welcome to Martingale all day long at any of them. > Hell, I'll buy a roulette wheel and you can come over to my place > if you play this strategy or any if its variants. > The casino wants you to Martingale -- it's favorable to them. > Why would they stop a loser? The doubling system in all casino's if you'd apply to it in an objective manner and would be allowed to - it makes a profit. Same for some slot machines over there. After some others played on it and it swallowed money - then majority of slot machines are not negative sum games anymore. If you play on them then, it's a positive sum game. If it would be always negative sum games then no lady would keep playing slot machines. > > The casino is not concerned with betting strategies. It is > concerned with folks gaining an edge. A betting system alone will > not give the player an edge. > No very wrong, a casino is interested in maximizing its profit. Kicking out folks that do well is part of that game. Oh by the way - I worked for a casino. Did you? >> >>> Your test below is interesting, though. The only real problems I >>> can >>> see with actually using it in dieharder are: >>> >> >> Yeah more interesting than the billion times discussed roulette >> system which >> has been analyzed completely flat. >> >>> a) One would need a theoretical estimate of the distribution of >>> filling given n log n draws on an n-slotted table (for largish n). >>> That >>> is, for a perfect rng, what SHOULD the distribution of success/ >>> failure >>> be. >> >> As we figured out by now in Artificial Intelligence the statistical >> assumptions made in the past they simply do not hold. >> >> For Artificial Intelligence we need a new sort of theoretical theory. >> >> As for the distribution problem, generatiors having a spread that's >> too accurate, >> the way to deliver a proof would be for example build a simple >> device. >> >> Build an old fashioned box where you can draw balls. Remember what >> you coud >> see on TV some 20 years ago or so (not sure it was like that in USA). >> >> A big basked with balls. The basket, in fact it's looking like this: >> >> http://www.rateyours.com/blog/uploaded_images/ >> lottery_machine-727064.jpg >> >> But now a much bigger machine like this with inside different means >> of randomizing the balls, >> actually also randomly modifying the inside obstacles of shaking of >> the balls. >> >> After a ball has been drawn you automatically have it annotated and >> the ball immediately goes back >> into the machine. For a full minute you have the balls in the machine >> shaken again and you draw >> again a ball. It is important to do this randomizing of the balls >> inside the machine for quite some time. >> I would propose a minute. >> >> Of course you have to do this with quite some balls. Say a thousand. >> >> Then you draw balls until all numbers have been drawn at least once. >> >> This cool experiment can be easily build. Of course the expected >> running time of a single experiment >> will be a few weeks. >> >> You can produce a number of those drawing machines though and have a >> look. >> >> Theories that seemingly work for small n, n being the number of >> balls, >> are much harder to maintain at bigger n's, as we also see in prime >> number research. >> >> The way how the machine gets designed of course is total crucial. I >> would propose a design that >> really shakes the balls really a lot through each other and really >> very thoroughly. >> >> Just like we nowadays know how flawed a big number of card shaking >> machines are that are popular to use. >> >> Such a lottery with realy a lot of balls would be very interesting to >> see the outcomes from. >> >> In fact i would prefer having produced number of those machines, so >> that it's possible to really have a lot of outcomes >> and then analyze them very well. >> >>> >>> b) One would then need the CDF for this distribution, to be able to >>> turn the results of N trials (of n log n pulls each) into a p-value >>> under the null hypothesis -- the probability of obtaining the >>> particular >>> number of successes and failures presuming a perfectly random >>> generator. >>> >>> That way dieharder could apply it rigorously to its 70 or 80 >>> embedded >>> rngs or to any user's outboard generator. There probably is >>> theoretical >>> statistical support for the PD and/or CDF -- you're analyzing the >>> tails >>> of a poissonian process -- but finding it or doing it yourself (or >>> myself), aye, that's the rub. One cannot just say "high degree of >>> certainty that it is an RNG" (by which one means that the rng in >>> question fails the test for randomness) in the test. HOW high? >>> Perfect >>> rngs or perfectly random processes will sometimes fill your >>> table, but >>> how often? >> >> If we assume that reality of life represents randomness, which is >> another >> rather good question in how far that theory is plausible, then using >> that >> assumption i'm very sure that the RNG's i investigated so far >> have a distribution which is too perfect, more perfect than i have >> seen >> in any reality. >> >> In fact most RNG's fill all slots faster than O ( n log n ), yet it's >> O ( n log n ) >> that they follow. >> >> This is RNG's that have come through all tests as being a good and >> very acceptabe RNG to be used. >> >> Realize i'm no RNG expert, so all the names of all those tests. >> >> For me it's just push button technology. I just designed a test >> and found it very odd that all RNG's have such perfect distributions >> that they don't even miss a single slot. >> >> I'd argue the only test that would be interesting to me to see how it >> might be in reality is the lottery machine test - yet with really >> a lot >> of balls. I'd prefer 10k balls over a 1000 in fact - yet for >> practical >> reasons i would agree with a number of above a 1000. >> >> Paper fiddling is really not interesting to me there to prove >> anything, >> as what i've seen in reality in randomness is total different from >> how >> RNG's model that. >> >> Regards, >> Vincent >> >> >>> How can you differentiate an "accident" when one does from >>> an actual failure? All of those questions require a more rigorous >>> theory and quantitative result embedded in a test that can be >>> systematically cranked up to more clearly resolve failures until >>> they >>> are unambiguous, not marginal maybe yes maybe no. >>> >>> I suspect that the failures this test would reveal are already more >>> than >>> covered in dieharder, in particular by the bit distribution tests >>> and >>> the monkey tests, but I'm not terribly happy with the monkey >>> tests and >>> would be perfectly thrilled to have a simpler to compute test that >>> revealed precisely this sort of flaw, systematically. And it >>> doesn't >>> hurt at all to have partially or fully redundant tests as long as >>> the >>> test themselves are rigorously valid. If you can find or compute >>> the >>> CDF for your test below, I'd be happy to wrap it up and add it to >>> dieharder, in other words. One can always SIMULATE a CDF, of >>> course, >>> but that requires a known good generator and sort of begs the >>> question >>> if you don't think that e.g. AES or threefish or KISS are good >>> generators that would actually pass your test. >>> >>> Even hardware/quantum sources of random bits are suspect -- they >>> often >>> are generated by a process that leaves in the traces of an >>> underlying >>> distribution. I'm not convinced that >>any<< process in the real >>> world >>> is >>truly<< random. Physics is ambiguous on the issue -- the >>> quantum >>> description of a closed system is just as deterministic as the >>> classical >>> one, and Master equation unpredictability on open subsets of a large >>> closed system reflects entropy/ignorance, not actual randomness >>> (hence >>> Einstein's famous "doesn't play dice" remark). But lots of this are >>> sufficiently random that one cannot detect any failure of >>> randomness, >>> modern crypto class generators being a prime example. >>> >>> rgb >>> >>>> >>>> In semi pseudo code, let's take an array of size a billion as an >>>> example, >>>> though usually a few million is more than ok: >>>> >>>> n = 2^30; // 2 to the power 30 >>>> >>>> Function TestNumbersForRandomness(RNG,n) { >>>> declare array hashtable[size n]; >>>> >>>> guessednlogn = 2 * (log n / log 2) * n; >>>> >>>> for( i = 0 ; i < n ; i++ ) >>>> hashtable[i] = FALSE; >>>> >>>> ndraws = filledn = 0; >>>> while( ndraws < guessednlogn ) { >>>> randomnumber = RNG(); >>>> r = randomnumber % n; // randomnumber = r (mod n) >>>> if( hashtable[r] == FALSE ) { >>>> hashtable[r] = TRUE; >>>> filledn++; >>>> if( filledn >= n ) >>>> break; >>>> >>>> } >>>> ndraws++; >>>> } >>>> >>>> if( filledn >= n ) >>>> print "With high degree of certainty data generated by a RNG\n"); >>>> else >>>> print "Not so sure it's a RNG\n"; >>>> >>>> } >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> Vincent >>>> >>>> >>>> >>>> >>>>> -- both unpredictable and >>>>> flat/decorrelated at all orders, and even though there aren't >>>>> really >>>>> enough of them for my purposes, I've used them as one of the >>>>> (small) >>>>> "gold standard" sources for testing dieharder even as I test >>>>> them. For >>>>> all practical purposes threefish or aes are truly random as >>>>> well and >>>>> they are a lot faster and easier to use as gold standard >>>>> generators, >>>>> though. >>>>> I don't quite understand why the single site restriction is >>>>> important -- >>>>> this site has been up for years and I don't expect it to go away >>>>> soon; >>>>> it is quite reliable. I don't think there is anything secret >>>>> about how >>>>> the numbers are generated, and I'll certify that the numbers it >>>>> produces >>>>> don't make dieharder unhappy. So 1 is fixable with a bit of >>>>> effort on >>>>> your part; 6 I don't really understand but the guy who runs the >>>>> site is >>>>> clearly willing to construct a custom feed for cash customers, if >>>>> there >>>>> is enough value in whatever it is you are trying to do to pay for >>>>> access. If it's just a lottery, well, lord, I can think of a >>>>> dozen ways >>>>> to make numbers so random that they'd be unimpeachable for any >>>>> sort of >>>>> lottery, both unpredictable and uncorrelated, and they don't any >>>>> of them >>>>> require any significant amount of entropy to get started. >>>>> I will add one warning -- "randomness" is a rather stringent >>>>> mathematical criterion, and is generally tested against the null >>>>> hypothesis. Amateurs who want to make random number generators >>>>> out of >>>>> supposedly "random" data streams or fancy algorithms almost >>>>> invariably >>>>> fail, sometimes spectacularly so. There are a half dozen or more >>>>> really, really good pseudorandom number generators out there and >>>>> it is >>>>> easy to hotwire them together into an xor-based high entropy >>>>> stream that >>>>> basically never repeats (feeding it a bit of real entropy now and >>>>> then >>>>> as it operates). I would strongly counsel you against trying to >>>>> take >>>>> e.g. weather data and make something "random" out of it. >>>>> Unless you >>>>> really know what you are doing, you will probably make something >>>>> that >>>>> isn't at all random and may not even be unpredictable. Even most >>>>> sources of "quantum" randomness (which is at least possibly "truly >>>>> random", although I doubt it) aren't flat, so that they carry the >>>>> signature of their generation process unless/until you manage to >>>>> transform them into something flat (difficult unless you KNOW the >>>>> distribution they are producing). Pseudorandom number generators >>>>> have >>>>> the serious advantage of being amenable to at least some >>>>> theoretical >>>>> analysis (so you can "guarantee" flatness out to some high >>>>> dimensionality, say) as well as empirical testing with e.g. >>>>> dieharder. >>>>> HTH, >>>>> >>>>> rgb >>>>>> Thanks, >>>>>> David Mathog >>>>>> mathog at caltech.edu >>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>>>> Robert G. Brown http:// >>>>> www.phy.duke.edu/~rgb/ >>>>> Duke University Dept. of Physics, Box 90305 >>>>> Durham, N.C. 27708-0305 >>>>> Phone: 1-919-660-2567 Fax: 919-660-2525 >>>>> email:rgb at phy.duke.edu >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>>>> Computing >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> Robert G. Brown http://www.phy.duke.edu/ >>> ~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>> >>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 17:46:30 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 17:46:30 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > If you have a domain of 0..3 where a generator generates and your modulo n is > just n-1, obviously that means it'll map a tad more to 0. > > Basically the deviation one would be able to measure in such case is that if > we have a generator that runs over a field of say size m and we want to map > that onto n entries then we have the next formula : > > m = x * n + y; > > Now your theory is basically if i summarize it that in such case the entries > 0..y-1 will have a tad higher hit than y.. m-1. What's a "tad" when you're measuring the quality of an RNG? I'm just curious. Could you be more specific? Just what are the limits, specifically, if your random number is a 30 bit uint that makes numbers in the range of 0-1 billion (nearly all generators make uints, with a few exceptions that usually make less than 32 bits -- 64 bit generators are still a rarity although of course two 32 rands makes one 64 bit rand) and you mod with m, especially a nice large m that doesn't integer divide r like 1.5 million? That means that each integer in the range 0 to 1.5 million gets 666 repetitions as the entire range of r gets sampled, except the first million that get 667. That means that the odds of pulling a number from 1 to a million are 1e6*667/1.e9 = .667. The odds of pulling a number from the second million are 0.5*666/1.e9 = .333 = 1 - .667. An old lady with terrible eyes could detect such an imbalance in probability from across the street -- you wouldn't even need a "random number generator tester". Weren't you advocating using this for nice large m like a million? I think that you were. No, wait! You were advocating something like one BILLION, right? Wrong direction to make it better, dude, this makes it worse. Note that this scales pretty well. For m in the range of thousands, the imbalance will be something like 0.666667 and 0.333330 -- still pretty easy to detect with any halfway decent RNG tester. Basically, you don't get (immeasurably) close to a uniform distribution in the weighting of any integer until you get down to (unsurprisingly) m of order unity compared to a uint, which at which point it basically becomes as accurate as m*(r/r_max) was in the first place. Note also that you've created an imbalance in the weighting of the integers you are sampling that is far, far greater and more serious than any other failure of randomness that your RNG might have. So much so that you couldn't possibly see any other -- it would be a signal swamped in the noise of your error, even for m merely in the thousands -- one part per million errors in randomness are easy to detect in simulations that draw 10^9 or so random numbers (which is still a SMALL TEST simulation -- real simulations draw 10^16 or 10^18 and your error would put answers on another PLANET compared to where they should be. Most coders probably can actually work this out with a pencil, and so I repeat no, nobody competent uses a modulus to generate integers in a fixed range in circumstances where the quality of the result matters, e.g. numerical simulation or cryptography as opposed to gaming, unless the modulus is a power of two. > However if x is large enough that shouldn't be a big problem. > > If we map now in the test i'm doing onto say a few million to a billion > entries, > the size of that x is a number of 40+ bits for most RNG's. x=32, a uint for most RNGs. Or to put it another way, RNGs generate a bit stream, which they might do with an algorithm that generates 30,31,32, or more bits at a time, but the prevalence of 32 bit architectures and the fact that it is trivial to concatenate 32s to get 64+ bits when desired has slowed the development of true 64 bit RNGs. Eventually there will be some, of course, and it will STILL be a mistake to use a modulus to create random integers in some general range. A bad algorithm is a bad algorithm, and this makes sense only if speed is more important than randomness (in which case one has to wonder, why use a 64 bit RNG in the first place, why use a good RNG in the first place). > So that means that the deviation of the effect you show above the order of > magnitued of 1 / 2^40 in such case, which is rather small. Except that it isn't, as I showed in a fair bit of detail. It might be if x were as large as you claim, which it isn't (in general or even "commonly") and if one confined m to be order unity. For m of order 2^20 (a million) the error for 2^40 is order 2^20 (a millionth) which shows up even in single precision floating point. Why bother testing such a stream for randomness? It fails. You've made it fail. It fails spectacularly if the generator is perfect, if the goddess Ifni herself produces the string of digits. It cannot succeed. > Especially because the 'test' if you want to call it like that, is operating > in the > granularity O ( log n ), we can fully ignore then the expected deviation > granularity O ( 2 ^ 40 ). Well, except that basically 100% of the rngs in the GSL pass your "test" when it is written correctly. They also produce precisely the correct/expected result (within easily understandable and expected random scatter) on top of that if they are "good" rngs. So the "test" isn't much of an actual test, and your assertion that "all rngs fail it" is false and based on a methodology that introduces many orders of magnitude of error greater than the generators are known to have as upper bounds. Given this fact, which I have personally verified, do you imagine that there might be other errors in your actual (not your pseudo) code? You gotta wonder. If you've tested a Mersenne Twister with your "test" and it fails to pass, either an MT is crap and all of the theoretical papers and experienced testers who have tested with sophisticated and peer-reviewed tools are stupid poo-poo heads, or, well, could it be that your test or implementation of the MT is crap and the MT itself in general is what everyone else seems to think that it is based on extensive "paper fiddling" and enormous piles of empirical testing evidence written by actual statisticians and rng experts. Which is to say, a damn good pseudo-RNG decorrelated in some 600+ dimensions that passes nearly all known tests with flying colors. Hmmm, let's put on our Bayesian thinking caps, consider the priors, and try very hard to guess which one is much much more likely on Jaynes' "decibel" scale of probabilities. Would you say that it is 20 decibels more likely that the MT is good and the test is broken? 50? 200? I like 2000 or thereabouts myself, or as we in the business might say, "it is a fact" that your test is broken since 10^200 is a really big number, comparatively speaking. Now, it would be nice if you apologized to "all RNGs" and "all programmers" and the various other groups you indicted on your little fallacious rant, but I'll consider myself enormously fortunate at this point if you simply acknowledge that maybe, just maybe, your original pronouncement -- that all rngs produce an egregiously, trivially verifiable excessive degree of first order sequential uniformity, is categorically and undeniably false. Of course, if you think I'm lying just to make you look bad, I can post a modified version of dieharder with your test embedded so absolutely anybody can see for themselves that all of the embedded generators pass your test and that not one single thing you asserted in grandiosely producing it was correct. The code is quite short and anybody can understand it. Or you can take my moderately expert word for it -- the results I posted are honest results produced using real RNGs from a real numerical library in the real test written by block copying your pseudocode, converting/realizing it in C, and fixing your obvious error in the generation of random ints in the range 0-m by using a tested algorithm written by people who actually know what they are doing that is IN the aforementioned real numerical library. Seriously, it is done. Finished. You're wrong. Say "I'm sorry, Mr. Mersenne Twister, if my test passes randu then how could it possibly fail you?" And don't forget to apologize to AES, RSA, DES, and all of the other encryption schema too. They all feel real bad that you called them stupid poo-poo heads unable to pass the simplest first order frequency test one can imagine, since they all had to pass MUCH more rigorous and often government mandated testing to ever get adopted as the basis for encryption. I don't expect an apology to me for being indicted along with ALL the OTHER programmers in the world for being stupid enough to use mod to make a supposedly uniformly distributed range of m rands. Not even Numerical Recipes was that boneheaded. But its OK, we all know that we didn't really ever do that, and if you did (and continue to do, apparently, learning nothing from my patient and thorough exposition of how it produces errors that are vastly greater than the ones that you think you are detecting) that's a problem to who? That's right, mister. To you. You'll just keep getting wroooooong answers, and then announcing them as fact and making yourself look silly. Or even sillier, if that is possible. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From atp at piskorski.com Sat Aug 27 10:26:23 2011 From: atp at piskorski.com (Andrew Piskorski) Date: Sat, 27 Aug 2011 10:26:23 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl> References: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl> Message-ID: <20110827142623.GA29931@piskorski.com> On Fri, Aug 26, 2011 at 08:17:46PM +0200, Vincent Diepeveen wrote: > On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote: >> A betting system will not improve the negative mathematical >> expectation of a casino game. Right. > Except that this system doesn't have a negative expectation. it has a > positive expectation. > > There is no other system in roulette that has a positive expectation, > other than the doubling system. Vincent, are you shitting us? Or am I misremembering the tortured history of this thread, and by "doubling system" you do NOT mean the trivial martingale betting system that's been used (disastrously) and analyzed for over 200 years? Actually it doesn't matter; as Shawn Hood pointed out above, your assertion is still wrong even if you actually meant some other non-martingale betting system. You insisting that *martingale* betting gives you a positive expectation at roulette just makes it much funnier! There are ways to gain positive expectation in roulette (other than the obvious fraud and collusion). They involve finding a poorly installed roulette table and using a wearable computer and physics to predict where the ball will land. Look up Thorp and Shannon's research on the subject; they actually used it in casinos c. 1961. None of those ways are due to some special method of betting. The point of betting systems is to optimize your small edge, but you have to HAVE that edge in the first place. Money management is important because tells you how to properly size your risk, but it can't give you alpha. Now yes, if you have a very volatile "roulette" game and a 0% edge (no advantage to either you or the house), with some luck you could get rich by playing it for a limited period of time and quitting while you're ahead. But you still have a 0% expectation game; look up the mathematical definition of "expectation". Also, I don't remember for sure, but I believe martingale betting is (always) more aggressive than Kelly. If so, then it is inherently stupid. Kelly defines the MAXIMUM size bet that it is rational to make, assuming your goal is maximum compounded wealth AND you have a quantifiable edge (however small) in the game. It can make sense to bet less than Kelly, and if you believe you have no edge the rational bet is zero. It is never rational to bet more than Kelly. In practice, even when you are sure you have a real edge, you want to bet less than Kelly, often much less. There are several reasons for that; one is that calculating Kelly depends on your estimate of how big your edge is, and it is easy to overestimate your edge such that in truth you are massively overbetting (taking way too much risk) at 2x Kelly or even more. But optimizing the way you bet doesn't turn an inherently losing game into a winner. If the edge is with the house - as it certainly is with a fair roulette table - the rational bet is not to make one. This news article is probably more interesting: http://www.theonion.com/articles/casino-has-great-night,1506/ Casino Has Great Night; May 28, 2003 -- Andrew Piskorski _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Sat Aug 27 11:27:37 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Sat, 27 Aug 2011 08:27:37 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: <20110827142623.GA29931@piskorski.com> Message-ID: > > > >There are ways to gain positive expectation in roulette (other than >the obvious fraud and collusion). They involve finding a poorly >installed roulette table and using a wearable computer and physics to >predict where the ball will land. Look up Thorp and Shannon's >research on the subject; they actually used it in casinos c. 1961. I think Shannon and Thorpe just analyzed it, without actually using it. See "The Eudaemonic Pie" about some physics guys at UC Santa Cruz who built wearable hardware. Early 70s, I should think, based on my recollections of the kind of ICs they were using. (I also note, based on the book, that while they were good at the physics, they weren't very good at electronics design and construction) They never made the system work very well (concept sound, execution not so hot)..but it did encourage the gaming industry to get new laws prohibiting the use of assistive devices. Just you and the casino, mano a mano (or, more accurately cerebro a leyes de la probabilidad) > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Wed Aug 31 13:29:18 2011 From: mathog at caltech.edu (David Mathog) Date: Wed, 31 Aug 2011 10:29:18 -0700 Subject: [Beowulf] materials for air shroud? Message-ID: Anybody know of a nice cheap, high melting point, easy to work with sheet material, for making a custom air shroud? We have one box with stuff in it that looks similar to HDPE, the material the white flexible cutting boards are made of, but it is a bit thinner and more rigid that that. Unfortunately there are no markings on it, so HDPE is just a guess. Whatever it is, it cut easily with scissors (I had to trim it slightly at one point.) Background. We have an older Supermicro SC-823 server with dual processors. The air shroud it came with only covers the first processor. That didn't matter much when it had two low power processors in it, but after upgrading it to dual Opteron 280s, the uncovered second one runs considerably hotter than the covered front one. (Swapping the processors around didn't help - the heat stayed where it was, so a ventilation issue, not a processor issue.) Supermicro does make a newer shroud which extends to the back of the case, but the manual (google for "SC-823 air shroud user's guide") indicates that it is designed for Intel CPUs. So it may or may not fit around the Opterons. The redesigned air shroud will probably work, but I'm about 90% confident that taping a sheet of plastic onto the back of the existing shroud would work as well - if I can find a plastic that won't flap around or melt. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Wed Aug 31 14:20:03 2011 From: deadline at eadline.org (Douglas Eadline) Date: Wed, 31 Aug 2011 14:20:03 -0400 (EDT) Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: <44280.192.168.93.213.1314814803.squirrel@mail.eadline.org> David, I have experimented with some simple ducting for my Limulus system. I found a Vinyl Flashing from Union Corrugating Company (purchased at Lowes home center) that has some nice features, it is bendable, holds its shape, easy to cut, and has a low carbon content (harder to burn than most plastics), and it is fairly stiff. My needs are "low temp" air ducting. I have not tested it with constant warm/hot air. -- Doug > Anybody know of a nice cheap, high melting point, easy to work with > sheet material, for making a custom air shroud? > > We have one box with stuff in it that looks similar to HDPE, the > material the white flexible cutting boards are made of, but it is a bit > thinner and more rigid that that. Unfortunately there are no markings > on it, so HDPE is just a guess. Whatever it is, it cut easily with > scissors (I had to trim it slightly at one point.) > > Background. We have an older Supermicro SC-823 server with dual > processors. The air shroud it came with only covers the first > processor. That didn't matter much when it had two low power processors > in it, but after upgrading it to dual Opteron 280s, the uncovered second > one runs considerably hotter than the covered front one. (Swapping the > processors around didn't help - the heat stayed where it was, so a > ventilation issue, not a processor issue.) Supermicro does make a newer > shroud which extends to the back of the case, but the manual (google for > "SC-823 air shroud user's guide") indicates that it is designed for > Intel CPUs. So it may or may not fit around the Opterons. > > The redesigned air shroud will probably work, but I'm about 90% > confident that taping a sheet of plastic onto the back of the existing > shroud would work as well - if I can find a plastic that won't flap > around or melt. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Aug 31 14:43:39 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 31 Aug 2011 11:43:39 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..) It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too. On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. ) Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar. Polystyrene and acrylic tend to be more brittle. ABS is very nice to work with. PVC is also easy to work with. Nylon is another possibility. Do you want to be able to glue it? What I would do is call up profesionalplastics.com formerly Cadillac Plastics (many outlets nationwide) and see what they have. It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go. Plastic Depot in Burbank has a huge selection. Drive over there, and ask the counter folks what would work for you. $10-20 will get you more plastic than you know what to do with. Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications. Jim Lux +1(818)354-2075 > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog > Sent: Wednesday, August 31, 2011 10:29 AM > To: beowulf at beowulf.org > Subject: [Beowulf] materials for air shroud? > > Anybody know of a nice cheap, high melting point, easy to work with > sheet material, for making a custom air shroud? > > We have one box with stuff in it that looks similar to HDPE, the > material the white flexible cutting boards are made of, but it is a bit > thinner and more rigid that that. Unfortunately there are no markings > on it, so HDPE is just a guess. Whatever it is, it cut easily with > scissors (I had to trim it slightly at one point.) > > Background. We have an older Supermicro SC-823 server with dual > processors. The air shroud it came with only covers the first > processor. That didn't matter much when it had two low power processors > in it, but after upgrading it to dual Opteron 280s, the uncovered second > one runs considerably hotter than the covered front one. (Swapping the > processors around didn't help - the heat stayed where it was, so a > ventilation issue, not a processor issue.) Supermicro does make a newer > shroud which extends to the back of the case, but the manual (google for > "SC-823 air shroud user's guide") indicates that it is designed for > Intel CPUs. So it may or may not fit around the Opterons. > > The redesigned air shroud will probably work, but I'm about 90% > confident that taping a sheet of plastic onto the back of the existing > shroud would work as well - if I can find a plastic that won't flap > around or melt. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Wed Aug 31 15:15:22 2011 From: mathog at caltech.edu (David Mathog) Date: Wed, 31 Aug 2011 12:15:22 -0700 Subject: [Beowulf] materials for air shroud? Message-ID: > Cardboard? Card stock? Masking tape? White glue? (that's what I > usually use for cooling ducts.. easy to cut, glue, tape..) It's no > more flammable than plastic, and it doesn't melt and get soft. That never crossed my mind. You sure about the flammability? I believe it for the ignition due to temperature (Fahrenheit 451 and all that). However, I have a gut feeling (but no data) that sparks are fairly likely to ignite cardboard, and less likely to ignite a solid plastic sheet (polyethylene or polypropylene, for instance). Not that I'm expecting sparks, but that is a real possibility when a power supply fails. Maybe even a brief flame. Of course paper won't hold up well compared to plastic if it gets wet. Moisture resistance is not important here though - if the insides of the computer are dripping, air shroud failure is the least of my worries. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Aug 31 15:18:36 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 31 Aug 2011 12:18:36 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: Paper doesn't catch fire at 451F.. it does start to turn brown. (Sorry Ray..) (I cook bacon on a rack over paper in a 450 degree oven.. and I doubt the temperature control is that tight) Flammability is an issue.. paper is rougher than most plastics, so a spark can lodge or a small fiber could catch. You could fireproof the paper pretty easily with a variety of treatments. Jim Lux +1(818)354-2075 > -----Original Message----- > From: David Mathog [mailto:mathog at caltech.edu] > Sent: Wednesday, August 31, 2011 12:15 PM > To: Lux, Jim (337C); beowulf at beowulf.org > Subject: RE: [Beowulf] materials for air shroud? > > > Cardboard? Card stock? Masking tape? White glue? (that's what I > > usually use for cooling ducts.. easy to cut, glue, tape..) It's no > > more flammable than plastic, and it doesn't melt and get soft. > > That never crossed my mind. > > You sure about the flammability? I believe it for the ignition due to > temperature (Fahrenheit 451 and all that). However, I have a gut > feeling (but no data) that sparks are fairly likely to ignite cardboard, > and less likely to ignite a solid plastic sheet (polyethylene or > polypropylene, for instance). Not that I'm expecting sparks, but that > is a real possibility when a power supply fails. Maybe even a brief > flame. Of course paper won't hold up well compared to plastic if it > gets wet. Moisture resistance is not important here though - if the > insides of the computer are dripping, air shroud failure is the least of > my worries. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From bill at cse.ucdavis.edu Wed Aug 31 17:04:44 2011 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 31 Aug 2011 14:04:44 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: <4E5EA1EC.7080804@cse.ucdavis.edu> On 08/31/2011 12:15 PM, David Mathog wrote: > That never crossed my mind. > > You sure about the flammability? I believe it for the ignition due to > temperature (Fahrenheit 451 and all that). However, I have a gut > feeling (but no data) that sparks are fairly likely to ignite cardboard, > and less likely to ignite a solid plastic sheet (polyethylene or > polypropylene, for instance). Not that I'm expecting sparks, but that > is a real possibility when a power supply fails. Maybe even a brief > flame. Of course paper won't hold up well compared to plastic if it > gets wet. Moisture resistance is not important here though - if the > insides of the computer are dripping, air shroud failure is the least of > my worries. I'm aware of a machine room fire that was attributed to cardboard dust and the storage of flammable material (paper and cardboard). I wouldn't recommend cardboard or anything else that might generate flammable dust in a high 50-90C airflow environment with low humidity. Supermicro does seem to play pretty fast and loose with a shroud and cooling in general. We had nodes bouncing off the thermal max (and throttling) despite air intake temperatures 30F below the specifications while having very low power load in the node (read that as no expansion cards, one low rpm disk, and the lowest clocked CPU). We did however get them to ship us free shrouds once we complained. Is it really worth wasting even an hour to not get the real shroud? Not sure if this is the one, but they aren't particularly expensive ($13): http://www.provantage.com/supermicro-mcp-310-18003-0n~7SUP91KW.htm _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Wed Aug 31 17:05:34 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 31 Aug 2011 17:05:34 -0400 (EDT) Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: On Wed, 31 Aug 2011, Lux, Jim (337C) wrote: Also thin aluminum. You can get aluminum sheeting that you can cut with scissors and that is easy to bend into shapes if you have a bending jig (or can make one with two pieces of board stock and a vise). Cheap, fireproof, meltproof at any temperatures you're likely to reach, no toxic fumes in a fire, can be glued or screwed. The one drawback is that it is a PITA to weld or solder if that's important to you, but for an air shroud you can probably make compression joints (interlocking U rims, squeezed down) that are adequate. Most hardware stores (roof flashing), some auto parts or hobby stores. Copper too, but more expensive. Don't know about thin "enough" sheet steel, but probably -- copper or steel would both weld or solder easily. rgb > Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..) It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too. > > On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. ) > > Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar. Polystyrene and acrylic tend to be more brittle. ABS is very nice to work with. PVC is also easy to work with. Nylon is another possibility. > > Do you want to be able to glue it? > > What I would do is call up profesionalplastics.com formerly Cadillac Plastics (many outlets nationwide) and see what they have. It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go. Plastic Depot in Burbank has a huge selection. > > Drive over there, and ask the counter folks what would work for you. $10-20 will get you more plastic than you know what to do with. > > Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications. > > > Jim Lux > +1(818)354-2075 > >> -----Original Message----- >> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog >> Sent: Wednesday, August 31, 2011 10:29 AM >> To: beowulf at beowulf.org >> Subject: [Beowulf] materials for air shroud? >> >> Anybody know of a nice cheap, high melting point, easy to work with >> sheet material, for making a custom air shroud? >> >> We have one box with stuff in it that looks similar to HDPE, the >> material the white flexible cutting boards are made of, but it is a bit >> thinner and more rigid that that. Unfortunately there are no markings >> on it, so HDPE is just a guess. Whatever it is, it cut easily with >> scissors (I had to trim it slightly at one point.) >> >> Background. We have an older Supermicro SC-823 server with dual >> processors. The air shroud it came with only covers the first >> processor. That didn't matter much when it had two low power processors >> in it, but after upgrading it to dual Opteron 280s, the uncovered second >> one runs considerably hotter than the covered front one. (Swapping the >> processors around didn't help - the heat stayed where it was, so a >> ventilation issue, not a processor issue.) Supermicro does make a newer >> shroud which extends to the back of the case, but the manual (google for >> "SC-823 air shroud user's guide") indicates that it is designed for >> Intel CPUs. So it may or may not fit around the Opterons. >> >> The redesigned air shroud will probably work, but I'm about 90% >> confident that taping a sheet of plastic onto the back of the existing >> shroud would work as well - if I can find a plastic that won't flap >> around or melt. >> >> Thanks, >> >> David Mathog >> mathog at caltech.edu >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Wed Aug 31 17:24:48 2011 From: mathog at caltech.edu (David Mathog) Date: Wed, 31 Aug 2011 14:24:48 -0700 Subject: [Beowulf] materials for air shroud? Message-ID: Robert G. Brown wrote > Also thin aluminum. No way, at least not anywhere near the motherboard. There isn't going to be a way to fasten it very tightly into position, just tape probably, possibly a zip tie at the back end. So it would be best if the shroud cannot short things out or scratch components off the motherboard if it falls out of position. I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for this, and it is similar to the shroud material we have in another server. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Glen.Beane at jax.org Wed Aug 31 17:42:23 2011 From: Glen.Beane at jax.org (Glen Beane) Date: Wed, 31 Aug 2011 21:42:23 +0000 Subject: [Beowulf] materials for air shroud? In-Reply-To: <4E5EA1EC.7080804@cse.ucdavis.edu> References: , <4E5EA1EC.7080804@cse.ucdavis.edu> Message-ID: On Aug 31, 2011, at 5:05 PM, "Bill Broadley" wrote: > On 08/31/2011 12:15 PM, David Mathog wrote: >> That never crossed my mind. >> >> You sure about the flammability? I believe it for the ignition due to >> temperature (Fahrenheit 451 and all that). However, I have a gut >> feeling (but no data) that sparks are fairly likely to ignite cardboard, >> and less likely to ignite a solid plastic sheet (polyethylene or >> polypropylene, for instance). Not that I'm expecting sparks, but that >> is a real possibility when a power supply fails. Maybe even a brief >> flame. Of course paper won't hold up well compared to plastic if it >> gets wet. Moisture resistance is not important here though - if the >> insides of the computer are dripping, air shroud failure is the least of >> my worries. > > I'm aware of a machine room fire that was attributed to cardboard dust > and the storage of flammable material (paper and cardboard). > I've seen servers shipped with paperboard shrouds directing air over the processors... I won't mention the vendor by name _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Wed Aug 31 17:44:45 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 31 Aug 2011 17:44:45 -0400 (EDT) Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: On Wed, 31 Aug 2011, David Mathog wrote: > Robert G. Brown wrote > >> Also thin aluminum. > > No way, at least not anywhere near the motherboard. There isn't going > to be a way to fasten it very tightly into position, just tape probably, > possibly a zip tie at the back end. So it would be best if the shroud > cannot short things out or scratch components off the motherboard if it > falls out of position. Don't forget the virtue of coat hangers. Even rubber coated ones. If you made the shroud out of aluminum, you could basically paint the bottom with liquid electrical tape (or better, dip it four or five times, drying it in between). It would basically rubber-coat it. No shorting, no scratching, still moderately fireproof. But as you wish. > I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for > this, and it is similar to the shroud material we have in another server. The biggest problem with stuff like this (IIRC a discussion from long ago) is you have to worry about what and how toxic it is in a fire, at least if you want fire-persons to be able to enter the room in a fire. Many plastics burn into really toxic materials. You also have to worry about how it will cope with high heat. The good thing about aluminum is that by the time it melts you won't care. I think some of the liquid tape compounds are fire retardant/melt resistant, and the aluminum itself is such a good conductor of heat that it will act as a heat sink for the rubber coating (in a good way). rgb > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Aug 31 17:56:08 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 31 Aug 2011 14:56:08 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: Plastic tape covering the aluminum.. 20 mil "pipe wrap" is useful stuff. 3M VHB double stick foam tape to hold it in place. But, enough of this feeble lash-up idea: I think the real solution is to have a second cluster doing a complete finite element model of the instantaneous temperature distribution within the processor in question, driving a set of actuators to form a dynamically optimized shroud. Or, perhaps the shroud could be made from millimachines implementing very simple control logic, but with an appropriate emergent behavior based on, say, their temperature sensing capability. The millimachines should, of course, be self replicating. Perhaps a suitably genetically engineered extremophile could be created? A second cluster does the model, a third cluster determines the optimum genetic sequence, a fourth cluster is responsible for iteratively doing the bioengineering to create the organisms, etc. (or for a less biologically inspired system, the third and fourth clusters are doing some form of adaptive evolving micro manufacturing) I'd provide more details, but really, that's just engineering, and is obvious to a skilled practitioner. (for those not at CalTech (who is my employer, as well as David's), you can contact their patent counsel for rights to the invention disclosed above, which I'm sure they'll be happy to license to you and reasonable and non-discriminatory terms.) > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > If you made the shroud out of aluminum, you could basically paint the > bottom with liquid electrical tape (or better, dip it four or five > times, drying it in between). It would basically rubber-coat it. No > shorting, no scratching, still moderately fireproof. But as you wish. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From reuti at staff.uni-marburg.de Mon Aug 1 13:38:30 2011 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 1 Aug 2011 19:38:30 +0200 Subject: [Beowulf] Fwd: H8DMR-82 ECC error References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk> Message-ID: Hi all, on behalf of J?rg I forward this to the list, as his account seems to be blocked to post to this list any longer. -- Reuti > ############# > Dear all, > > as I cannot post directly to the list although I am subscribing to it, I have > asked a friend of mine to post that for me. > I am currently having severe problems with one of the clusters I am > maintaining. Around 50% of these nodes are crashing when we are running cp2k > on it. Although they are IB nodes, even without the IB card installed the test > jobs crash the node as well. So I can rule out an IB related problem. Memtest > was ok, I done 9 cycles without any problems. Unfortunately I cannot swap the > memory as I don't have any of them at all and hence I have to rely on Memtest > here. The nodes which are causing the problems show other symptoms as well: I > had problem with 3 of them to boot again after a normal shutdown procedure > (the fans come on, and die after a short period and I don't even get to the > POST stage at all). So they are offline as well. Two of the remaining nodes were > exceedingly hot after a reboot. When I took them out the fans were spinning > and now they appear to be ok. These are AMD Opteron 2220 dual core processors > with 2 CPUs per node. The mother board is a H8DMR-82 with the BIOS version > 080014 (release date 07/13/2007). It appears that almost always the same nodes > are crashing with this error message: > > Hardware Error > CPU0 Machine Check Exception 4 Bank 2 b200200000000863 > TSC 108dd369444 > Processor 2:40f13 Time 1311847912 Socket 0 APIC 0 > MC2-Status: Uncorredted error, report: yes MisV: invalid > CPU context corrupt: yes UECC Error > Bud Unit Error: prefetch/ECC error in data read from NB: local node originated > (SRC) > Transaction type: prefetch (mem access), no timeout, cache level L3/generic. > Participating Processors: local node originated (SRC) > > Judging from this I would guess there is a memory related problem. > Given there are a number of people on the list here and they probably have > seen similar hardware before, do I simply have a bad batch of hardware which > is known to cause problems or do I have a different issue here? What I am after > is some kind of idea of where to look next. It is not the compiled program as > taking out the disc and placing it in a different node (same motherboard, same > Opteron but slightly different flags) does not cause any problems at all. > Given the large number of nodes which causing problems, before I am proposing > to write off these nodes I would like to make sure it is not a subtle issue > like a BIOS upgrade which could cure the problem. > > Many thanks for your help and all the best from London > > J?rg > > ############## > > > > -- > ************************************************************* > J?rg Sa?mannshausen > University College London > Department of Chemistry > Gordon Street > London > WC1H 0AJ > > email: j.sassmannshausen at ucl.ac.uk > web: http://sassy.formativ.net > > Please avoid sending me Word or PowerPoint attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Wed Aug 3 00:28:10 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Wed, 03 Aug 2011 14:28:10 +1000 Subject: [Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release In-Reply-To: <207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com> References: <26FE8986BC6B4E56B7E9EABB63E10A38@pccarlosf2><4DA5E85D.4010801@ats.ucla.edu> <207BB2F60743C34496BE41039233A8090656ACFF@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Message-ID: <4E38CE5A.5080506@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/07/11 18:47, Hearns, John wrote: >> I think a lot of this will apply to non-SGE batch schedulers -- in >> fact Torque will support hwloc in a future release. > > That sounds good to me! > > (Hint - if anyone from Altair is listening in it would be useful...) There's already been Carl Smith from pbspro.com on the hwloc mailing list finding configure problems with AIX (which have been fixed)... cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk44zloACgkQO2KABBYQAh8KUACfd5r45HcKBQdxRdRm3rb42fO1 VbgAoINM9lQ2rCIsa6G9Yv0b2qWii2aC =F/Jm -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Mon Aug 8 21:45:38 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 09 Aug 2011 11:45:38 +1000 Subject: [Beowulf] IBM terminates Blue Waters contract Message-ID: <4E409142.8060900@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NCSA is now looking for a new hardware supplier.. http://www.ncsa.illinois.edu/BlueWaters/system.html # Effective August 6, 2011, IBM terminated its contract # with the University of Illinois to provide the supercomputer # for the National Center for Supercomputing Applications' # Blue Waters project. More info at El Reg: http://www.theregister.co.uk/2011/08/08/ibm_kills_blue_waters_super/ # To date, IBM had shipped three racks of the Blue Waters # supers to NCSA, and these will be returned. IBM has to # give back $30m to NCSA. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5AkUIACgkQO2KABBYQAh8icQCeL9PM2FW6ZAMLKz9Wg55oePGY /FcAoJQGuHMOTNZ0bNddHIAy40ZCe5oB =fID2 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mdidomenico4 at gmail.com Tue Aug 9 08:46:13 2011 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Tue, 9 Aug 2011 08:46:13 -0400 Subject: [Beowulf] Memory Testing? Message-ID: The last discussion on the list about faulty memory surronded using some software like memtest or hpl to trigger SBE. I'm curious if anyone has any experience with ECC uncorrectable errors (specifically not the identification of), but which specific dimm in the chassis it's pointing to. The mcelog in linux doesn't seem to report the dimm slot correctly on my supermicro boards. The only way i know how to narrow it down is to pull all the dimms, and then test one at a time, with the system. I'm curious if there is a better way, or if anyone has any opinions on the below (or another similar) piece of hardware that might do the same http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From a.travis at abdn.ac.uk Tue Aug 9 08:54:14 2011 From: a.travis at abdn.ac.uk (Tony Travis) Date: Tue, 09 Aug 2011 13:54:14 +0100 Subject: [Beowulf] Memory Testing? In-Reply-To: References: Message-ID: <4E412DF6.1080204@abdn.ac.uk> On 09/08/11 13:46, Michael Di Domenico wrote: > [...] > I'm curious if there is a better way, or if anyone has any opinions on > the below (or another similar) piece of hardware that might do the > same > > http://www.memorytesters.com/ramcheck_lx/ramcheck_lx_ddr3_tester.htm Hi, Michael. We had a RAM tester back in the day, but memory that it passed still gave errors in the real systems we were using. I screen memory in the system it is installed in using Memtest86+ then run Charles Cazabon's user-mode "Memtester" on the running system to assess its reliability: http://pyropus.ca/software/memtester/ HTH, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Aug 11 08:04:58 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 11 Aug 2011 08:04:58 -0400 (EDT) Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E412DF6.1080204@abdn.ac.uk> References: <4E412DF6.1080204@abdn.ac.uk> Message-ID: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Most of you are probably not aware of this story about trade secrets and Bash scripts on HPC clusters (I was not until a few months ago) http://www.clustermonkey.net//content/view/308/33/ -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Aug 11 10:05:00 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 07:05:00 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: Interesting.. You wrote: There is a general understanding that unless explicitly marked in the contents of the script (the text file that is the Bash program), a Bash script is freely available for use and modification by anyone. In some cases there is a copyright notice or a license that allows (or disallows) sharing or modification. These are always explicitly stated at the beginning of the script and obvious to anyone who reads or modifies the script. This is, of course, not correct under current law, marking is not required for copyright protection. pretty much everything is born copyrighted. Putting markings on it helps you claim for willful infringement (i.e. the recipient can't claim "I didn't know") which helps on the damages situation. And, under the Berne convention, marking is required to assert your rights in some countries (All Rights Reserved is also required in some places) Likewise, under current law, registration of copyright isn't required. Registration allows you to collect statuatory damages for infringement, though. For trade secrets, it's a bit trickier. The recipient has to know that it's trade secret, but that can be done by marking on the delivery media, by a separate document, or even by verbal communication (here, this is proprietary, don't disclose it). And you have to take some means to protect it: claiming something that is trade secret that is printed on bus stop benches won't fly. In any case, just because scripts aren't obfuscated doesn't mean they're not subject to trade secret protection. If the owner of the secret takes some precautions to prevent wide disclosure (e.g. warning the recipient of its proprietary nature). This is the aspect that will surely be the core of litigation: would a "reasonable person" have known that the material was subject to trade secret protection. As we all know, reasonable people differ, and the attorneys on both sides will trot out examples of marking and disclosure practices: good, bad, and indifferent. As Doug noted, "special measur es" need to be taken, but there's no bright line standard for those measures, and, in practice, they can be pretty lax (and would be expected to be proportionate to the value of the secret.. the secret formula for Coke is probably more protected than the schedule for sweeping the floor in the manufacturing plant... both provide competitive advantage to Coke, but one is probably more important) Something that a lot of tech people in industry (particularly those coming from academia and working with open source) probably don't really fully understand is that pretty much everything you do for your employer is probably proprietary in some sense, and there is probably a written policy to that effect, which you, as an employee, are expected to be aware of. Or your supervisor told you, or the nice personnel person told you when you hired in 20 years ago, etc. Mundane operational details of the business might be claimed to provide competitive advantage, especially if they're not "industry standard" (humorously, if the employer has some really lame practice that's horrible, that might make it protectable.. then you could argue in court about whether it had any value). This is why there are "document review" departments and periodic training: It helps reduce the problem of "inadvertent disclosure" and "I didn't know". This is the really tricky thing about trade secret: inadvertent disclosure can ruin the protection. There have been cases of deliberately (and nefariously) "losing" trade secret info to spoil the protection. And then, there is a somewhat notorious case of documents from Intel(?) that were in an envelope at a hotel desk or convention(?) with a person's name on it. Turns out there was a competitor (AMD?) with an employee of the same name, who accidentally got the documents handed to them (Hi, I'm John Smith, I think you have something for me.), opened the envelope, realized the problem, handed them right back, but in later action, it was alleged that this was sufficient to break the protection. I don't recall all the details, and it probably settled out of court. It's really complex.. "the bell, having been rung, cannot be unrung" (the phrase shows up in tons of legal writings), but in reality, if the inadvertent disclosure wasn't too big, etc. Important things: 1) The language it's written in or obfuscation or not makes no difference. 2) the size of the work makes no difference. "Candy/Is dandy/But liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair use, and anyway, the copyright may have expired) 3) the intellectual effort in the work makes no difference (unlike patents, there's no requirement of novelty) (unless you're trying to claim trade secret protection on something that's already public knowledge.. the thing might be public, but the fact that you selected that particular one might be trade secret.) Jim I am not a lawyer, but I spent all too many (hundreds) of hours in depositions and meetings and court where one of the main issues was the "was there adequate notice of the trade secret status of the information" as well as "did they steal it", not to mention the always popular "can you describe the secret with specificity and particularity". If the bad guy steals the trade secret and then keeps it secret, it's fairly hard to show that they actually have it. There are also folks who have developed techniques to evade the restrictions of an NDA ("Sure, I signed it, but that exceeded the scope of my corporate authority, so it's invalid. " "Technically, I wasn't an employee that afternoon, even though I was in the morning, and I was the next week, but hey, for that afternoon, I wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry about giving you that business card with the company name on it, but it was what I happened to have in my wallet") ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Douglas Eadline [deadline at eadline.org] Sent: Thursday, August 11, 2011 05:04 To: beowulf at beowulf.org Subject: [Beowulf] All Your BASH Are Belong To Us Most of you are probably not aware of this story about trade secrets and Bash scripts on HPC clusters (I was not until a few months ago) http://www.clustermonkey.net//content/view/308/33/ -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Thu Aug 11 10:35:01 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Thu, 11 Aug 2011 21:35:01 +0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk> <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: <4E43E895.6070803@pathscale.com> On 08/11/11 07:04 PM, Douglas Eadline wrote: > > Most of you are probably not aware of this story > about trade secrets and Bash scripts on HPC clusters > (I was not until a few months ago) > > http://www.clustermonkey.net//content/view/308/33/ IANAL and this shouldn't be taken as legal advice - Bret Stouder if you haven't done so already contact SFLC immediately. They provide legal services to open source projects and may be able to help. (I can help put you in touch with them or other very good open source legal council.) ./C /* Armchair lawyers are generally not helpful and in many cases it's counterproductive for them to express their own personal views. I hope this discussion dies immediately without further comment */ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Thu Aug 11 12:58:47 2011 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 11 Aug 2011 12:58:47 -0400 (EDT) Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> I had a chance to read some of the depositions, really interesting and even embarrassing stuff. My guess is Atipa got angry when Bret and the other employees left to form a new company. They may have searched for ways to stop them and decided to go after them for what Atipa considered "trade secrets." A more or less traditional method to prevent ex-employees from stealing your secret sauce (as you explain below). The only problem was much of the "secrets" were developed and shared in an open environment. This may have been a surprise to those in charge and makes their claims a bit harder to swallow. (i.e. a fundamental misunderstanding of how trade secrets can be protected in an open source ecosystem). And, what I try to point out in the article, is that this open source ecosystem is what allowed hardware vendors to sell clusters in the first place. There is of course more to this case than I describe in the article. I'll post more as it progresses. -- Doug > Interesting.. You wrote: > There is a general understanding that unless explicitly marked in the > contents of the script (the text file that is the Bash program), a Bash > script is freely available for use and modification by anyone. In some > cases there is a copyright notice or a license that allows (or disallows) > sharing or modification. These are always explicitly stated at the > beginning of the script and obvious to anyone who reads or modifies the > script. > > This is, of course, not correct under current law, marking is not required > for copyright protection. pretty much everything is born copyrighted. > Putting markings on it helps you claim for willful infringement (i.e. the > recipient can't claim "I didn't know") which helps on the damages > situation. And, under the Berne convention, marking is required to assert > your rights in some countries (All Rights Reserved is also required in > some places) Likewise, under current law, registration of copyright isn't > required. Registration allows you to collect statuatory damages for > infringement, though. > > For trade secrets, it's a bit trickier. The recipient has to know that > it's trade secret, but that can be done by marking on the delivery media, > by a separate document, or even by verbal communication (here, this is > proprietary, don't disclose it). And you have to take some means to > protect it: claiming something that is trade secret that is printed on bus > stop benches won't fly. In any case, just because scripts aren't > obfuscated doesn't mean they're not subject to trade secret protection. > If the owner of the secret takes some precautions to prevent wide > disclosure (e.g. warning the recipient of its proprietary nature). This > is the aspect that will surely be the core of litigation: would a > "reasonable person" have known that the material was subject to trade > secret protection. As we all know, reasonable people differ, and the > attorneys on both sides will trot out examples of marking and disclosure > practices: good, bad, and indifferent. As Doug noted, "special measures" > need to be taken, but there's no bright line standard for those measures, > and, in practice, they can be pretty lax (and would be expected to be > proportionate to the value of the secret.. the secret formula for Coke is > probably more protected than the schedule for sweeping the floor in the > manufacturing plant... both provide competitive advantage to Coke, but one > is probably more important) > > Something that a lot of tech people in industry (particularly those > coming from academia and working with open source) probably don't really > fully understand is that pretty much everything you do for your employer > is probably proprietary in some sense, and there is probably a written > policy to that effect, which you, as an employee, are expected to be aware > of. Or your supervisor told you, or the nice personnel person told you > when you hired in 20 years ago, etc. Mundane operational details of the > business might be claimed to provide competitive advantage, especially if > they're not "industry standard" (humorously, if the employer has some > really lame practice that's horrible, that might make it protectable.. > then you could argue in court about whether it had any value). This is why > there are "document review" departments and periodic training: It helps > reduce the problem of "inadvertent disclosure" and "I didn't know". > > > This is the really tricky thing about trade secret: inadvertent disclosure > can ruin the protection. There have been cases of deliberately (and > nefariously) "losing" trade secret info to spoil the protection. And > then, there is a somewhat notorious case of documents from Intel(?) that > were in an envelope at a hotel desk or convention(?) with a person's name > on it. Turns out there was a competitor (AMD?) with an employee of the > same name, who accidentally got the documents handed to them (Hi, I'm John > Smith, I think you have something for me.), opened the envelope, realized > the problem, handed them right back, but in later action, it was alleged > that this was sufficient to break the protection. I don't recall all the > details, and it probably settled out of court. It's really complex.. "the > bell, having been rung, cannot be unrung" (the phrase shows up in tons of > legal writings), but in reality, if the inadvertent disclosure wasn't too > big, etc. > > > Important things: > 1) The language it's written in or obfuscation or not makes no difference. > 2) the size of the work makes no difference. "Candy/Is dandy/But > liquor/Is quicker" is/was copyrighted by Ogden Nash (used here as fair > use, and anyway, the copyright may have expired) > 3) the intellectual effort in the work makes no difference (unlike > patents, there's no requirement of novelty) (unless you're trying to claim > trade secret protection on something that's already public knowledge.. the > thing might be public, but the fact that you selected that particular one > might be trade secret.) > > > Jim > > I am not a lawyer, but I spent all too many (hundreds) of hours in > depositions and meetings and court where one of the main issues was the > "was there adequate notice of the trade secret status of the information" > as well as "did they steal it", not to mention the always popular "can you > describe the secret with specificity and particularity". If the bad guy > steals the trade secret and then keeps it secret, it's fairly hard to show > that they actually have it. There are also folks who have developed > techniques to evade the restrictions of an NDA ("Sure, I signed it, but > that exceeded the scope of my corporate authority, so it's invalid. " > "Technically, I wasn't an employee that afternoon, even though I was in > the morning, and I was the next week, but hey, for that afternoon, I > wasn't an employee, so I'm not bound by the NDA signed by corporate. Sorry > about giving you that business card with the company name on it, but it > was what I happened to have in my wallet") > > > > ________________________________________ > From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf > Of Douglas Eadline [deadline at eadline.org] > Sent: Thursday, August 11, 2011 05:04 > To: beowulf at beowulf.org > Subject: [Beowulf] All Your BASH Are Belong To Us > > Most of you are probably not aware of this story > about trade secrets and Bash scripts on HPC clusters > (I was not until a few months ago) > > http://www.clustermonkey.net//content/view/308/33/ > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Aug 11 13:40:35 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 10:40:35 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> Message-ID: > -----Original Message----- > From: Douglas Eadline [mailto:deadline at eadline.org] > Sent: Thursday, August 11, 2011 9:59 AM > To: Lux, Jim (337C) > Cc: beowulf at beowulf.org > Subject: RE: [Beowulf] All Your BASH Are Belong To Us > > > I had a chance to read some of the depositions, really interesting > and even embarrassing stuff. My guess is Atipa got angry when > Bret and the other employees left to form a new company. They > may have searched for ways to stop them and decided > to go after them for what Atipa considered "trade secrets." > A more or less traditional method to prevent ex-employees from > stealing your secret sauce (as you explain below). > > The only problem was much of the "secrets" were developed > and shared in an open environment. This may have been a > surprise to those in charge and makes their claims > a bit harder to swallow. (i.e. a fundamental misunderstanding > of how trade secrets can be protected in an open source ecosystem). > And, what I try to point out in the article, is that this > open source ecosystem is what allowed hardware vendors to > sell clusters in the first place. > > There is of course more to this case than I describe in the article. > I'll post more as it progresses. > > -- Yes.. and a standard way to attempt to do a "non-compete" (which are typically illegal in California) is for the former employer to threaten the new employer (or customers of the spin-off) with the "theft of trade secrets" allegation. Even if the allegation is unfounded, you have to spend time and money dealing with it (if you're the ex-employee) or it creates sufficient fear, uncertainty, and doubt (on the part of the customers of the ex-employee spin off). I'm also not so na?ve as to think that employees don't actually take trade secrets with them and use them, so it's not entirely improbable. But, in a perfect world, there would be substantial sanctions for doing this kind of thing as a competitive maneuver. Legal niceties aside, Doug brings up an interesting point about "trade secrets" or intellectual property in general... You work at a job and become experienced and knowledgeable in a particular line of business. How much of that is "general knowledge" (not protectable) and how much is "peculiar to the employer" (protectable)? This is a pretty fuzzy thing. A for instance.. say you leaned over to the next cube and asked someone for help formulating a particularly complex command line to grep a file. The exact, character for character version of that command line probably belongs to the employer, but what about the knowledge you now have of how to do those kinds of searches? What if your coworker had actually done the command line (in its exact form) at some other place and brought it with them? Then, there's the practical details of getting approval from a (conservative) power-that-is. Sure, you might have gotten it from open source, but will your corporate reviewer agree? Or, will they use the default "it's all proprietary unless proven otherwise, and we don't have time to look at your proof, and you don't have time to be gathering the proof". It's really depends on a corporate/organizational commitment to open source to institute processes to keep all this stuff straight. (and we won't even get into "open source" vs "able to redistribute") _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Aug 11 13:53:55 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 11 Aug 2011 13:53:55 -0400 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> Message-ID: <4E441733.80701@scalableinformatics.com> On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote: > It's really depends on a corporate/organizational commitment to open > source to institute processes to keep all this stuff straight. (and > we won't even get into "open source" vs "able to redistribute") There are profoundly incorrect views running around out there, as to what "open source" means. I had someone tell me that GPLv2 prevented distribution of binaries (it doesn't). I've watched people slap additional legal bits in conflict with GPL onto GPL source. I don't want to say "its a mess" but I do want to say that "there is a profound need for a very simple statement of what is and isn't allowed by each license." Including what is involved in altering licensing. While these are more or less amusing and some won't really result in court cases and precedents, there is at least one effort that has some nice potential to test GPL. See the zfs on linux systems. c.f. http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue I can't imagine this will end well for any company shipping this, in source, build script, or binary form. CDDL aside, Oracle's got some IP claims they could file, as well as other things. I can't believe that shipping NetBSD binaries with Oracle IP inside would end well either. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Aug 11 14:19:00 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 11:19:00 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E441733.80701@scalableinformatics.com> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> Message-ID: > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman > Sent: Thursday, August 11, 2011 10:54 AM > To: beowulf at beowulf.org > Subject: Re: [Beowulf] All Your BASH Are Belong To Us > > On 08/11/2011 01:40 PM, Lux, Jim (337C) wrote: > > > It's really depends on a corporate/organizational commitment to open > > source to institute processes to keep all this stuff straight. (and > > we won't even get into "open source" vs "able to redistribute") > > There are profoundly incorrect views running around out there, as to > what "open source" means. I had someone tell me that GPLv2 prevented > distribution of binaries (it doesn't). I've watched people slap > additional legal bits in conflict with GPL onto GPL source. > > I don't want to say "its a mess" but I do want to say that "there is a > profound need for a very simple statement of what is and isn't allowed > by each license." Including what is involved in altering licensing. > Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses. They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems. It's been a royal pain for me trying to release stuff to the general public in a useful form. It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run. But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements. What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it. The complication is that in general, work funded by NASA and performed by government employees is a "government work not subject to copyright" although work funded by NASA and performed by an educational institution (e.g. JPL, which is part of Cal Tech) is subject to Bayh-Dole, and is presumed to be owned by the educational institution, with a fully paid, non-exclusive license granted to the government for government purposes. (there is, of course, litigation about what those "government purposes" might happen to be). The incompatibility arises because NASA is legally obligated to distribute their products with no downstream restrictions on use, which is not the same as, for instance, GPL, which imposes restrictions on downstream use. NASA (and the government in general) doesn't care if someone takes their product and uses it to make a subsequent closed source product which is totally proprietary. (and in fact, NASTRAN would be a fine example of this) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From gus at ldeo.columbia.edu Thu Aug 11 15:28:28 2011 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 11 Aug 2011 15:28:28 -0400 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> Message-ID: <4E442D5C.1030902@ldeo.columbia.edu> Lux, Jim (337C) wrote: >> -----Original Message----- >> From: Douglas Eadline [mailto:deadline at eadline.org] >> Sent: Thursday, August 11, 2011 9:59 AM >> To: Lux, Jim (337C) >> Cc: beowulf at beowulf.org >> Subject: RE: [Beowulf] All Your BASH Are Belong To Us >> >> >> I had a chance to read some of the depositions, really interesting >> and even embarrassing stuff. My guess is Atipa got angry when >> Bret and the other employees left to form a new company. They >> may have searched for ways to stop them and decided >> to go after them for what Atipa considered "trade secrets." >> A more or less traditional method to prevent ex-employees from >> stealing your secret sauce (as you explain below). >> >> The only problem was much of the "secrets" were developed >> and shared in an open environment. This may have been a >> surprise to those in charge and makes their claims >> a bit harder to swallow. (i.e. a fundamental misunderstanding >> of how trade secrets can be protected in an open source ecosystem). >> And, what I try to point out in the article, is that this >> open source ecosystem is what allowed hardware vendors to >> sell clusters in the first place. >> >> There is of course more to this case than I describe in the article. >> I'll post more as it progresses. >> >> -- > > > Yes.. and a standard way to attempt to do a "non-compete" > (which are typically illegal in California) is for the former employer > to threaten the new employer (or customers of the spin-off) with the > "theft of trade secrets" allegation. Even if the allegation is unfounded, > you have to spend time and money dealing with it > (if you're the ex-employee) or it creates sufficient fear, > uncertainty, and doubt (on the part of the customers > of the ex-employee spin off). > Very true, and in the arena of intimidating former employees and their current employers/competitors, there is nothing special about the privatization of shell scripts or of nifty regular expressions to grep files. Recent examples include fields perhaps more lucrative than HPC, such as English muffins (Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella): http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm and high frequency trading (isn't it HPC also?) (Goldman Sachs vs. Sergey Aleynikov): http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html What is interesting is that across the board the thing that free entrepreneurs seem to hate the most is their competitors free entrepreneurship. > I'm also not so na?ve as to think that employees don't actually take > trade secrets with them and use them, so it's not entirely improbable. > > But, in a perfect world, there would be substantial sanctions for doing this kind of thing as a competitive maneuver. > > > Legal niceties aside, Doug brings up an interesting point about "trade secrets" or intellectual property in general... > > You work at a job and become experienced and knowledgeable in a > particular line of business. How much of that is "general knowledge" > (not protectable) and how much is "peculiar to the employer" (protectable)? This is a pretty fuzzy thing. > > A for instance.. say you leaned over to the next cube and asked > someone for help formulating a particularly complex command line to > grep a file. The exact, character for character version of that > command line probably belongs to the employer, but what about the > knowledge you now have of how to do those kinds of searches? > What if your coworker had actually done the command line > (in its exact form) at some other place and brought it with them? > > Then, there's the practical details of getting approval from a > (conservative) power-that-is. Sure, you might have gotten it from > open source, but will your corporate reviewer agree? Or, will they > use the default "it's all proprietary unless proven otherwise, and > we don't have time to look at your proof, and you don't have time > to be gathering the proof". > > It's really depends on a corporate/organizational commitment to > open source to institute processes to keep all this stuff straight. > (and we won't even get into "open source" vs "able to redistribute") > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Thu Aug 11 15:57:43 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 11 Aug 2011 15:57:43 -0400 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E442D5C.1030902@ldeo.columbia.edu> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E442D5C.1030902@ldeo.columbia.edu> Message-ID: <4E443437.9070705@scalableinformatics.com> On 08/11/2011 03:28 PM, Gus Correa wrote: > Recent examples include fields perhaps more lucrative than HPC, > such as English muffins > (Bimbo Bakeries/Thomas English Muffins vs. Chris Botticella): > > http://www.usatoday.com/money/industries/food/2010-07-29-english-muffin-lawsuit_N.htm That muffin just got real ... > > and high frequency trading (isn't it HPC also?) (Goldman Sachs vs. > Sergey Aleynikov): > > http://www.huffingtonpost.com/2010/02/11/sergey-aleynikov-goldman_n_458931.html > > What is interesting is that across the board > the thing that free entrepreneurs seem > to hate the most is their competitors free entrepreneurship. I am running into an internal parser error in attempting to understand this last sentence. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Thu Aug 11 19:56:02 2011 From: mathog at caltech.edu (David Mathog) Date: Thu, 11 Aug 2011 16:56:02 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Since this is very OT, I'll try to keep it short. Here is the problem - imagine a group of people who neither know nor trust each other, yet must agree on the fairness of a single random number. Basically they are going to have a lottery. They aren't organized enough to generate such a number themselves - it must be found from some process already active on the web, and be so obviously "fair" that they won't argue about that. Everybody must be able to obtain it freely from a web connection. Can any of you think of a source on the web for a set of small files with these properties: 1. from a trusted source (here this mostly means the data is generated for some other innocuous purpose) 2. represents a largely random process (temperature readings, stock market values, etc.) with a set generated at known intervals, preferably daily (at least M-F) 3. are never, ever, revised 4. are distributed reliably (for instance, signed files) 5. are publicly and freely available 6. can be obtained reliably (is available from many sites) So far I have looked at stock market values and weather data - without much luck. You would think the S&P 500 is the S&P 500 and one could look it up on any site and get the same data. Not so! Check the Yahoo and Google financial sites for the first few weeks of Jan. 2011 and you will find digits that differ between the two sites in every single column. Not every day mind you, but often enough that it isn't reliable. Heck, the volume numbers differ by large factors between the two sites. So just choose one site and go with that? Not so fast - if the single source goes down the data is unavailable, and there is no guarantee that the site (which is not party to this particular use of their data) might not revise the page or choose to block it entirely. Or weather data, right? Lots of random bits there and we trust NOAA. But good luck with criteria 3-6. In particular, they don't give data out for free. In theory no US Government site should, since they are supposed to charge to recover distribution costs. Criteria 4-6 are typical of software distributed on mirror sites, but so far I have not found any physical measurements which are distributed in a similar manner. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mdidomenico4 at gmail.com Thu Aug 11 20:28:11 2011 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Thu, 11 Aug 2011 20:28:11 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: How many random numbers per day are you expecting? If everyone checks at exactly 1pm, should they all see the same "random" number or should they each get their own "random" number? What kind of entropy are you expecting on "random"? On Thu, Aug 11, 2011 at 7:56 PM, David Mathog wrote: > Since this is very OT, I'll try to keep it short. > > Here is the problem - imagine a group of people who neither know nor > trust each other, yet must agree on the fairness of a single random > number. ?Basically they are going to have a lottery. ?They aren't > organized enough to generate such a number themselves - it must be found > from some process already active on the web, and be so obviously "fair" > that they won't argue about that. ?Everybody must be able to obtain it > freely from a web connection. > > Can any of you think of a source on the web for a set of small files > with these properties: > > 1. ?from a trusted source (here this mostly means the data is generated > ? ?for some other innocuous purpose) > 2. ?represents a largely random process (temperature readings, > ? ?stock market values, etc.) with a set generated at known intervals, > ? ?preferably daily (at least M-F) > 3. ?are never, ever, revised > 4. ?are distributed reliably (for instance, signed files) > 5. ?are publicly and freely available > 6. ?can be obtained reliably (is available from many sites) > > So far I have looked at stock market values and weather data - without > much luck. > > You would think the S&P 500 is the S&P 500 and one could look it up on > any site and get the same data. ?Not so! Check the Yahoo and Google > financial sites for the first few weeks of Jan. 2011 and you will find > digits that differ between the two sites in every single column. ?Not > every day mind you, but often enough that it isn't reliable. ?Heck, the > volume numbers differ by large factors between the two sites. ?So just > choose one site and go with that? ?Not so fast - if the single source > goes down the data is unavailable, and there is no guarantee that the > site (which is not party to this particular use of their data) might not > revise the page or choose to block it entirely. > > Or weather data, right? ?Lots of random bits there and we trust NOAA. > But good luck with criteria 3-6. ?In particular, they don't give data > out for free. ?In theory no US Government site should, since they are > supposed to charge to recover distribution costs. > > Criteria 4-6 are typical of software distributed on mirror sites, but so > far I have not found any physical measurements which are distributed in > a similar manner. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From peter.st.john at gmail.com Thu Aug 11 20:44:15 2011 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 11 Aug 2011 20:44:15 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: David, I was thinking the National Weather Service, instead of NOAA; it's a vital public service that such information is recorded and diseminated for airfields and the like, e.g.: http://www.weather.gov/climate/getclimate.php?wfo=bou So I would write a script to scrape least significant digits from that, for agreed times, dates, and locations. Whoever writes the script and wherever it is run, anyone can check its results manually. However, that item has a disclaimer that the data is subject to review :) So it may matter how far back in time you need to be able to go, and how long into the future you need the data to be available at the same place. But nobody promises their website will stay unchanged indefinitely, they can't. But at any given time, a group can agree on (say) the lowest significant digits of the temperatures at time T in cities X, Y, and Z as reported at time T2 by the NWS. Peter On Thu, Aug 11, 2011 at 7:56 PM, David Mathog wrote: > Since this is very OT, I'll try to keep it short. > > Here is the problem - imagine a group of people who neither know nor > trust each other, yet must agree on the fairness of a single random > number. Basically they are going to have a lottery. They aren't > organized enough to generate such a number themselves - it must be found > from some process already active on the web, and be so obviously "fair" > that they won't argue about that. Everybody must be able to obtain it > freely from a web connection. > > Can any of you think of a source on the web for a set of small files > with these properties: > > 1. from a trusted source (here this mostly means the data is generated > for some other innocuous purpose) > 2. represents a largely random process (temperature readings, > stock market values, etc.) with a set generated at known intervals, > preferably daily (at least M-F) > 3. are never, ever, revised > 4. are distributed reliably (for instance, signed files) > 5. are publicly and freely available > 6. can be obtained reliably (is available from many sites) > > So far I have looked at stock market values and weather data - without > much luck. > > You would think the S&P 500 is the S&P 500 and one could look it up on > any site and get the same data. Not so! Check the Yahoo and Google > financial sites for the first few weeks of Jan. 2011 and you will find > digits that differ between the two sites in every single column. Not > every day mind you, but often enough that it isn't reliable. Heck, the > volume numbers differ by large factors between the two sites. So just > choose one site and go with that? Not so fast - if the single source > goes down the data is unavailable, and there is no guarantee that the > site (which is not party to this particular use of their data) might not > revise the page or choose to block it entirely. > > Or weather data, right? Lots of random bits there and we trust NOAA. > But good luck with criteria 3-6. In particular, they don't give data > out for free. In theory no US Government site should, since they are > supposed to charge to recover distribution costs. > > Criteria 4-6 are typical of software distributed on mirror sites, but so > far I have not found any physical measurements which are distributed in > a similar manner. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Aug 11 20:55:30 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 17:55:30 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: Low order digits from weather stations are not likely to be random. They're almost certainly converted from some quantized converter, and may actually have a double conversion (Celsius Fahrenheit) NWS and NOAA are actually part of the same organization, aren't they. (since the NWS web page at weather.gov is titled "NOAA's National Weather Service") Jim Lux +1(818)354-2075 From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Peter St. John Sent: Thursday, August 11, 2011 5:44 PM To: David Mathog Cc: beowulf at beowulf.org Subject: Re: [Beowulf] OT: public random numbers? David, I was thinking the National Weather Service, instead of NOAA; it's a vital public service that such information is recorded and diseminated for airfields and the like, e.g.: http://www.weather.gov/climate/getclimate.php?wfo=bou So I would write a script to scrape least significant digits from that, for agreed times, dates, and locations. Whoever writes the script and wherever it is run, anyone can check its results manually. However, that item has a disclaimer that the data is subject to review :) So it may matter how far back in time you need to be able to go, and how long into the future you need the data to be available at the same place. But nobody promises their website will stay unchanged indefinitely, they can't. But at any given time, a group can agree on (say) the lowest significant digits of the temperatures at time T in cities X, Y, and Z as reported at time T2 by the NWS. Peter On Thu, Aug 11, 2011 at 7:56 PM, David Mathog > wrote: Since this is very OT, I'll try to keep it short. Here is the problem - imagine a group of people who neither know nor trust each other, yet must agree on the fairness of a single random number. Basically they are going to have a lottery. They aren't organized enough to generate such a number themselves - it must be found from some process already active on the web, and be so obviously "fair" that they won't argue about that. Everybody must be able to obtain it freely from a web connection. Can any of you think of a source on the web for a set of small files with these properties: 1. from a trusted source (here this mostly means the data is generated for some other innocuous purpose) 2. represents a largely random process (temperature readings, stock market values, etc.) with a set generated at known intervals, preferably daily (at least M-F) 3. are never, ever, revised 4. are distributed reliably (for instance, signed files) 5. are publicly and freely available 6. can be obtained reliably (is available from many sites) So far I have looked at stock market values and weather data - without much luck. You would think the S&P 500 is the S&P 500 and one could look it up on any site and get the same data. Not so! Check the Yahoo and Google financial sites for the first few weeks of Jan. 2011 and you will find digits that differ between the two sites in every single column. Not every day mind you, but often enough that it isn't reliable. Heck, the volume numbers differ by large factors between the two sites. So just choose one site and go with that? Not so fast - if the single source goes down the data is unavailable, and there is no guarantee that the site (which is not party to this particular use of their data) might not revise the page or choose to block it entirely. Or weather data, right? Lots of random bits there and we trust NOAA. But good luck with criteria 3-6. In particular, they don't give data out for free. In theory no US Government site should, since they are supposed to charge to recover distribution costs. Criteria 4-6 are typical of software distributed on mirror sites, but so far I have not found any physical measurements which are distributed in a similar manner. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From samuel at unimelb.edu.au Thu Aug 11 21:54:11 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 12 Aug 2011 11:54:11 +1000 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> References: <4E412DF6.1080204@abdn.ac.uk> <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> Message-ID: <4E4487C3.60605@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/08/11 22:04, Douglas Eadline wrote: > Most of you are probably not aware of this story > about trade secrets and Bash scripts on HPC clusters On the copyright side of things (not the trade secret stuff), my understanding (IANAL, etc) is that anything you create you[0] hold copyright on[1], and for someone else to copy it they must have some agreement (license) to be able to do so. Thus a shell script with no license attached or embedded is copyrighted and you should get explicit permission to use it.. cheers, Chris [0] - where "you" is the entity that is the copyright holder, not necessarily the creator. [1] - yes, I know there are some entities that aren't allowed to hold copyright.. :-) - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5Eh8MACgkQO2KABBYQAh8sOgCePl6n4UTNZGMAePc8Kb+kmK4a DHwAoJeVgYKUMDpJe78/2mQqbL2ryJ4M =UAan -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cbergstrom at pathscale.com Thu Aug 11 23:22:34 2011 From: cbergstrom at pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) Date: Fri, 12 Aug 2011 10:22:34 +0700 Subject: [Beowulf] Open source @NASA - WAS: OT In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> Message-ID: <4E449C7A.1030102@pathscale.com> On 08/12/11 01:19 AM, Lux, Jim (337C) wrote: > > Closer to home for me, the NASA Open Source License (which was conjured up a decade or so ago) is apparently incompatible with just about everyone else's licenses. They had a "How do we encourage Open Source use at NASA" symposium a few months back hosted at Ames with lots of remote participants and licensing issues and complexities is, in my opinion, probably one of the bigger problems. It's been a royal pain for me trying to release stuff to the general public in a useful form. It sure would be nice to be able to give someone an .iso and say, here, load this, run make clean; make all, and you'll have your stuff ready to run. But no, that .iso will be a derived work comprised of a multitude of components with all sorts of different license agreements. What we have to do is the (to me) accursed approach of: here's a list of eleventy-seven URLs and FTP sites, go get these files, check their MD5 to make sure they're the same one we used, and have at it. Hi Jim, For this exact problem you've described an ebuild could be a very good solution. (I've personally abandoned gentoo a long time ago) By solution I mean bash script that explicitly checks the hashes, resolves the deps and pulls the source to build everything from the eleventy-seven URLs and FTP sites. The people working with gentoo-science would likely appreciate it a lot. (The learning curve is fairly low if you know bash already) -------- With regards to open source license proliferation and incompatibilities. I think most people in the community are working towards streamlining, but changes after-the-fact can be difficult/impossible. I'm empathetic to your situation and I'd say work towards getting your projects merged with something like gentoo to start and then maybe something like OpenSuSE build service. This would cover a very large % of the packaging/distribution problem and get it in the hands of users easily. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From samuel at unimelb.edu.au Thu Aug 11 23:51:12 2011 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 12 Aug 2011 13:51:12 +1000 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> Message-ID: <4E44A330.7090503@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/11 04:19, Lux, Jim (337C) wrote: > The incompatibility arises because NASA is legally > obligated to distribute their products with no > downstream restrictions on use, Actually no - the NASA license is incompatible with the GPL (at least) because: http://www.gnu.org/licenses/license-list.html # The NASA Open Source Agreement, version 1.3, is not # a free software license because it includes a provision # requiring changes to be your ?original creation?. Free # software development depends on combining code from # third parties, and the NASA license doesn't permit this. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9 +iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT =GL6d -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Thu Aug 11 23:57:03 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Aug 2011 20:57:03 -0700 Subject: [Beowulf] All Your BASH Are Belong To Us In-Reply-To: <4E44A330.7090503@unimelb.edu.au> References: <4E412DF6.1080204@abdn.ac.uk>, <39761.192.168.93.213.1313064298.squirrel@mail.eadline.org> <38613.192.168.93.213.1313081927.squirrel@mail.eadline.org> <4E441733.80701@scalableinformatics.com> , <4E44A330.7090503@unimelb.edu.au> Message-ID: Yes, that too... ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Christopher Samuel [samuel at unimelb.edu.au] Sent: Thursday, August 11, 2011 20:51 To: beowulf at beowulf.org Subject: Re: [Beowulf] All Your BASH Are Belong To Us -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/11 04:19, Lux, Jim (337C) wrote: > The incompatibility arises because NASA is legally > obligated to distribute their products with no > downstream restrictions on use, Actually no - the NASA license is incompatible with the GPL (at least) because: http://www.gnu.org/licenses/license-list.html # The NASA Open Source Agreement, version 1.3, is not # a free software license because it includes a provision # requiring changes to be your ?original creation?. Free # software development depends on combining code from # third parties, and the NASA license doesn't permit this. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5EozAACgkQO2KABBYQAh8ESQCfa3VfRt5Y1FxllDapHpqTrev9 +iAAn3TWi9YHq6yaAc6BMWCbeJZaQBFT =GL6d -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 00:31:30 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 00:31:30 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Thu, 11 Aug 2011, David Mathog wrote: > Since this is very OT, I'll try to keep it short. > > Here is the problem - imagine a group of people who neither know nor > trust each other, yet must agree on the fairness of a single random > number. Basically they are going to have a lottery. They aren't > organized enough to generate such a number themselves - it must be found > from some process already active on the web, and be so obviously "fair" > that they won't argue about that. Everybody must be able to obtain it > freely from a web connection. http://www.random.org/ sincerely, rgb > > Can any of you think of a source on the web for a set of small files > with these properties: > > 1. from a trusted source (here this mostly means the data is generated > for some other innocuous purpose) > 2. represents a largely random process (temperature readings, > stock market values, etc.) with a set generated at known intervals, > preferably daily (at least M-F) > 3. are never, ever, revised > 4. are distributed reliably (for instance, signed files) > 5. are publicly and freely available > 6. can be obtained reliably (is available from many sites) > > So far I have looked at stock market values and weather data - without > much luck. > > You would think the S&P 500 is the S&P 500 and one could look it up on > any site and get the same data. Not so! Check the Yahoo and Google > financial sites for the first few weeks of Jan. 2011 and you will find > digits that differ between the two sites in every single column. Not > every day mind you, but often enough that it isn't reliable. Heck, the > volume numbers differ by large factors between the two sites. So just > choose one site and go with that? Not so fast - if the single source > goes down the data is unavailable, and there is no guarantee that the > site (which is not party to this particular use of their data) might not > revise the page or choose to block it entirely. > > Or weather data, right? Lots of random bits there and we trust NOAA. > But good luck with criteria 3-6. In particular, they don't give data > out for free. In theory no US Government site should, since they are > supposed to charge to recover distribution costs. > > Criteria 4-6 are typical of software distributed on mirror sites, but so > far I have not found any physical measurements which are distributed in > a similar manner. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 11:21:37 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 08:21:37 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Robert G. Brown wrote: > Everybody must be able to obtain it > > freely from a web connection. > > http://www.random.org/ > Nice site. They have something that is very close, the pregenerated random files, from which a small set of digits may be extracted, and the files themselves have MD5 checksums (but are not signed). They also support https. It comes up a little short on criteria 1 (we really don't know what is going on behind the scenes) and 6 (it is a single site.) Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Fri Aug 12 11:26:05 2011 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 12 Aug 2011 11:26:05 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: <4E45460D.9040505@scalableinformatics.com> On 08/12/2011 11:21 AM, David Mathog wrote: > Robert G. Brown wrote: > >> Everybody must be able to obtain it >>> freely from a web connection. >> >> http://www.random.org/ And from SGI days ... http://www.lavarnd.org/ > Nice site. They have something that is very close, the pregenerated > random files, from which a small set of digits may be extracted, and the > files themselves have MD5 checksums (but are not signed). > They also support https. It comes up a little short on criteria 1 (we > really don't know what is going on behind the scenes) and 6 (it is a > single site.) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 11:58:28 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 08:58:28 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Peter St. John wrote: > But at any given time, a group can agree on (say) the lowest significant > digits of the temperatures at time T in cities X, Y, and Z as reported at > time T2 by the NWS. Actually we don't know that, at least not reliably enough for this purpose. It may be that the one web address is actually multiple servers, and if the NWS pushes out data revisions these could return different results for T:X,Y,Z at T2 if the servers were not strictly synchronized. Never mind the caching problems that revisions like this would create on browsers. I have no idea if the NWS revises their data files, but it would not be surprising if they did. After posting I thought of one other source of more or less random verifiable numbers - the scores of sporting events. These are not always generated every day, and are seasonal for the various sports. They are however highly verifiable and when multiple events are grouped, pretty much impossible to "fix" to preselected digits. For instance: http://www.nfl.com/scores http://mlb.mlb.com/mlb/scoreboard http://scores.espn.go.com/nba/scoreboard?date=20110304 These sites maintain historical records. Even if they didn't the scores are widely published, and there are tens of thousands of witnesses to the original event, so it would be pretty much impossible to intentionally change a final score. There could still be copying/typo errors from site to site though, but if such an error was discovered it would be easy enough to resolve. There is no intrinsic order to the scores, and some scheduled games might be canceled, so it would have to be something like "sort the scores from all NBA teams who played on 4/4/11 into ascending order and concatenate the digits". Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Fri Aug 12 12:09:46 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Fri, 12 Aug 2011 09:09:46 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: All nice suggestions, but I wonder if they're truly random. Scores of games have underlying patterns from the "rules of the game" (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points. basketball goals are 2 or 3 points, etc.) I'm sure someone has analyzed this. I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller). What about using random.org and it being backed-up on archive.org? Does that give you the "multiple independent sites" desired? ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu] Sent: Friday, August 12, 2011 08:58 To: Peter St. John; beowulf at beowulf.org Subject: Re: [Beowulf] OT: public random numbers? Peter St. John wrote: > But at any given time, a group can agree on (say) the lowest significant > digits of the temperatures at time T in cities X, Y, and Z as reported at > time T2 by the NWS. Actually we don't know that, at least not reliably enough for this purpose. It may be that the one web address is actually multiple servers, and if the NWS pushes out data revisions these could return different results for T:X,Y,Z at T2 if the servers were not strictly synchronized. Never mind the caching problems that revisions like this would create on browsers. I have no idea if the NWS revises their data files, but it would not be surprising if they did. After posting I thought of one other source of more or less random verifiable numbers - the scores of sporting events. These are not always generated every day, and are seasonal for the various sports. They are however highly verifiable and when multiple events are grouped, pretty much impossible to "fix" to preselected digits. For instance: http://www.nfl.com/scores http://mlb.mlb.com/mlb/scoreboard http://scores.espn.go.com/nba/scoreboard?date=20110304 These sites maintain historical records. Even if they didn't the scores are widely published, and there are tens of thousands of witnesses to the original event, so it would be pretty much impossible to intentionally change a final score. There could still be copying/typo errors from site to site though, but if such an error was discovered it would be easy enough to resolve. There is no intrinsic order to the scores, and some scheduled games might be canceled, so it would have to be something like "sort the scores from all NBA teams who played on 4/4/11 into ascending order and concatenate the digits". Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 13:04:43 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 10:04:43 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: > All nice suggestions, but I wonder if they're truly random. Random enough in this case - as they are only used to form a seed for a random number generator, and a seed is only needed "rarely". So even though pro basketball scores have definite trends and often look like (101,95),(103,87),(98,76), these can still create a decent seed value once sorted and concatenated: 10310198958776 (Lets assume the seed need not be odd.) > What about using random.org and it being backed-up on archive.org? Does that give you the "multiple independent sites" desired? To some degree, but not as much as the large number of sites that distribute game scores and stock values. I originally favored using stock values until it turned out that those numbers are squishier than one might have expected, particularly so for indices like the S&P 500 and Dow Jones. A fellow who works at S&P told me that the opening prices are prone to timing problems, since at T=0+delta some of the issues in the index will have traded, and some will not, with the untraded stock values being filled in with stale values. I think similar timing issues affect all the other index values too (high/low/close). In these cases, since the index is derived from formulas, some sites may be independently calculating the values, and tiny differences in the times the stock values are measured result in different numbers. All it takes is one trade difference between the sample points to change some digits. When I get some time I still need to look and see if the high/low/close values for individual stocks are also variable from web site to web site. These numbers might be more reliable for single stocks since they might all trace back to the data feed from the exchange where the issue trades. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Fri Aug 12 13:22:46 2011 From: mathog at caltech.edu (David Mathog) Date: Fri, 12 Aug 2011 10:22:46 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Michael Di Domenico wrote: > How many random numbers per day are you expecting? One would be sufficient. > If everyone checks at exactly 1pm, should they all see the same > "random" number or should they each get their own "random" number? They should all see the same number. Example: a random number based on physical events which occurred on 8/10/11 would become available on or shortly after that day. Starting from the time it first becomes available, and going forward ideally forever, everybody who wants to should be able to retrieve that same random number. That is, nobody should be able to predict the number before hand, and everybody should be able to verify it later. So the number must be both random and etched in stone. > What kind of entropy are you expecting on "random"? In practice relatively little is needed, 16 bits should be plenty. (More wouldn't hurt, of course.) Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 14:35:17 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 14:35:17 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Fri, 12 Aug 2011, David Mathog wrote: > Robert G. Brown wrote: > >> Everybody must be able to obtain it >>> freely from a web connection. >> >> http://www.random.org/ >> > > Nice site. They have something that is very close, the pregenerated > random files, from which a small set of digits may be extracted, and the > files themselves have MD5 checksums (but are not signed). > They also support https. It comes up a little short on criteria 1 (we > really don't know what is going on behind the scenes) and 6 (it is a > single site.) Behind the scenes is documented pretty well on the site, and the guy who runs it is a human being, you can communicate with him to learn even more. I already know him a bit, as he and I have collaborated on applying dieharder to test random.org datasets -- even "the" random.org dataset as of some time ago (I have a few hundred MB of random number from the site in my dieharder directory). IIRC, the numbers are generated continuously and fairly slowly by grabbing and filtering and transforming atmospheric noise. As a source of entropy, that is probably excellent if (as noted) slow, but many good sources of entropy seem to be fairly slow. He has good reason to think that his numbers are theoretically "true random numbers" -- both unpredictable and flat/decorrelated at all orders, and even though there aren't really enough of them for my purposes, I've used them as one of the (small) "gold standard" sources for testing dieharder even as I test them. For all practical purposes threefish or aes are truly random as well and they are a lot faster and easier to use as gold standard generators, though. I don't quite understand why the single site restriction is important -- this site has been up for years and I don't expect it to go away soon; it is quite reliable. I don't think there is anything secret about how the numbers are generated, and I'll certify that the numbers it produces don't make dieharder unhappy. So 1 is fixable with a bit of effort on your part; 6 I don't really understand but the guy who runs the site is clearly willing to construct a custom feed for cash customers, if there is enough value in whatever it is you are trying to do to pay for access. If it's just a lottery, well, lord, I can think of a dozen ways to make numbers so random that they'd be unimpeachable for any sort of lottery, both unpredictable and uncorrelated, and they don't any of them require any significant amount of entropy to get started. I will add one warning -- "randomness" is a rather stringent mathematical criterion, and is generally tested against the null hypothesis. Amateurs who want to make random number generators out of supposedly "random" data streams or fancy algorithms almost invariably fail, sometimes spectacularly so. There are a half dozen or more really, really good pseudorandom number generators out there and it is easy to hotwire them together into an xor-based high entropy stream that basically never repeats (feeding it a bit of real entropy now and then as it operates). I would strongly counsel you against trying to take e.g. weather data and make something "random" out of it. Unless you really know what you are doing, you will probably make something that isn't at all random and may not even be unpredictable. Even most sources of "quantum" randomness (which is at least possibly "truly random", although I doubt it) aren't flat, so that they carry the signature of their generation process unless/until you manage to transform them into something flat (difficult unless you KNOW the distribution they are producing). Pseudorandom number generators have the serious advantage of being amenable to at least some theoretical analysis (so you can "guarantee" flatness out to some high dimensionality, say) as well as empirical testing with e.g. dieharder. HTH, rgb > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 14:40:36 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 14:40:36 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <4E45460D.9040505@scalableinformatics.com> References: <4E45460D.9040505@scalableinformatics.com> Message-ID: On Fri, 12 Aug 2011, Joe Landman wrote: > On 08/12/2011 11:21 AM, David Mathog wrote: >> Robert G. Brown wrote: >> >>> Everybody must be able to obtain it >>>> freely from a web connection. >>> >>> http://www.random.org/ > > And from SGI days ... http://www.lavarnd.org/ Yeah, like that. Notice the work they have to do to make a not-really-random or only partially-random source flat, unpredictable, random. What they do is probably overkill -- nobody on earth could detect a deviation from randomness if they did only half of their folding and retransformation with crypto grade prngs, but it is still a pretty reliable scheme. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 12 14:59:59 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 12 Aug 2011 14:59:59 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Fri, 12 Aug 2011, Lux, Jim (337C) wrote: > All nice suggestions, but I wonder if they're truly random. > > Scores of games have underlying patterns from the "rules of the game" (e.g. american football games tend to have scores that are tied to the 6,7,8 or 3 points. basketball goals are 2 or 3 points, etc.) > > I'm sure someone has analyzed this. > > I suppose one could sum a large number of scores, which would give you something with Gaussian distribution, and then you could transform it into something with uniform distribution (sort of a inverse Box-Muller). > > What about using random.org and it being backed-up on archive.org? Does that give you the "multiple independent sites" desired? As I said and repeat, nothing like this is at all random. Random is stuff like thermal noise, shot noise, quantum noise, and even all of those things are distributed and not flat and require massaging to make into uniform deviates or random bits. Unpredictable is easy, of course -- flip a coin, roll some dice -- until you need to make it >>rigorously<< unpredictable and >>rigorously<< uncorrelated, at which point you need to not screw around with weather, scores, market closing values, even "randomly sampled" ticks of a nanosecond clock aren't that random without some work to make them so. I liked the lavarnd site, and I like random.org. Hell, tap into both of their streams, they're both practically perfect as sources of random numbers go, and it gives you your redundancy and you can xor their streams together to get yet another irrelevant and probably unnecessary degree of lack of correlation. Even if one stream is subtley correlated and the other is too, the chances of the correlations "matching" and persisting through an xor process are astronomical. But then, finding correlations in the output of a properly seeded crypto prng is pretty astronomically unlikely BEFORE you xor-fold it stream-wise a few dozen times into a source of real entropy like atmospheric noise or electro-optical noise. If you want something better, you'll probably have to explain your application in a bit more detail. Do you need rigorously random and flat numbers, or just something unpredictable? The latter is cheap and easy and can be done in the privacy of your own home by reading from /dev/random or /dev/urandom (or perhaps from Intel's new on-CPU rngs). The former requires theory and some work and some heavy duty empirical testing. Just remember, numbers are not random. Numbers are numbers. The number 7 could be "random" or not not by its nature but by how the 7 was generated. Processes, in other words, are (approximately, oxymoronically) random. If you want random numbers, find a (mathematically provably) "random" process, at least to some order and for some purposes... rgb > > > ________________________________________ > From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of David Mathog [mathog at caltech.edu] > Sent: Friday, August 12, 2011 08:58 > To: Peter St. John; beowulf at beowulf.org > Subject: Re: [Beowulf] OT: public random numbers? > > Peter St. John wrote: >> But at any given time, a group can agree on (say) the lowest significant >> digits of the temperatures at time T in cities X, Y, and Z as reported at >> time T2 by the NWS. > > Actually we don't know that, at least not reliably enough for this > purpose. It may be that the one web address is actually multiple > servers, and if the NWS pushes out data revisions these could return > different results for T:X,Y,Z at T2 if the servers were not strictly > synchronized. Never mind the caching problems that revisions like this > would create on browsers. I have no idea if the NWS revises their data > files, but it would not be surprising if they did. > > After posting I thought of one other source of more or less random > verifiable numbers - the scores of sporting events. These are not > always generated every day, and are seasonal for the various sports. > They are however highly verifiable and when multiple events are grouped, > pretty much impossible to "fix" to preselected digits. For instance: > > http://www.nfl.com/scores > http://mlb.mlb.com/mlb/scoreboard > http://scores.espn.go.com/nba/scoreboard?date=20110304 > > These sites maintain historical records. Even if they didn't the scores > are widely published, and there are tens of thousands of witnesses to > the original event, so it would be pretty much impossible to > intentionally change a final score. There could still be copying/typo > errors from site to site though, but if such an error was discovered it > would be easy enough to resolve. There is no intrinsic order to the > scores, and some scheduled games might be canceled, so it would have to > be something like "sort the scores from all NBA teams who played on > 4/4/11 into ascending order and concatenate the digits". > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From nixon at nsc.liu.se Fri Aug 12 16:46:21 2011 From: nixon at nsc.liu.se (Leif Nixon) Date: Fri, 12 Aug 2011 22:46:21 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On 12 August 2011 17:58, David Mathog wrote: > After posting I thought of one other source of more or less random > verifiable numbers - the scores of sporting events. ?These are not > always generated every day, and are seasonal for the various sports. > They are however highly verifiable and when multiple events are grouped, > pretty much impossible to "fix" to preselected digits. Have you looked at RFC3797? Not sure if it has any solutions for you, but it at least discusses the same problems. -- Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert ------------------------------------------------------------ National Supercomputer Centre? ? -? ? ? Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Sat Aug 13 13:51:46 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 13 Aug 2011 13:51:46 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Fri, 12 Aug 2011, Leif Nixon wrote: > On 12 August 2011 17:58, David Mathog wrote: > >> After posting I thought of one other source of more or less random >> verifiable numbers - the scores of sporting events. ?These are not >> always generated every day, and are seasonal for the various sports. >> They are however highly verifiable and when multiple events are grouped, >> pretty much impossible to "fix" to preselected digits. > > Have you looked at RFC3797? Not sure if it has any solutions for you, but it > at least discusses the same problems. If people know how you are going to pick the seed of your rng, and know the rng, and know (or measure) the distribution function from which your seed is being drawn, they can easily transform the game into a non-zero sum game with advantage over all of those that don't do all of that. The only way to avoid this sort of thing is to pick your seed from a flat, unpredictable distribution. Unpredictable (in it's purest sense) includes flat, but the score distribution of almost any sporting event is, I'm pretty sure, not flat. That's why I really don't like the idea of running a lottery off of data like this. No state lottery could ever be certified on top of this sort of data. I'll tell you what. Piggy back your lottery to theirs. Powerball games occur every day all over the US. Pick your seed from the last 10 digits of one of those games. They are announced, publicly available on websites (I'm pretty sure), and if they aren't certifiably random, something is seriously wrong. In any event they are usually generated from an easily understandable random physical process that is almost certainly flat as well as unpredictable. Then pop it into your favorite AES-based or threefish based RNG, or cook up something yourself with even more rotors, spin it a while, and out comes your lottery winner -- basically a transmogrification of public state lottery number, but that's an ADVANTAGE, not a disadvantage... rgb > > > -- > Leif Nixon? ? ? ? ? ? ? ? ? ? ?? -? ? ? ? ? ? Systems expert > ------------------------------------------------------------ > National Supercomputer Centre? ? -? ? ? Linkoping University > ------------------------------------------------------------ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Sat Aug 13 20:06:04 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 13 Aug 2011 20:06:04 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: >>> After posting I thought of one other source of more or less random >>> verifiable numbers - the scores of sporting events. ?These are not I immediately thought of another widely published stream of immutable noise: the congressional record. sorry, no smiley ;) > Then pop it into your favorite AES-based or threefish based RNG, or cook > up something yourself with even more rotors, spin it a while, and out > comes your lottery winner sorry, I don't understand your emphasis on flatness. why does the distribution of the seed (entropy source) matter, as long as it's reasonably large and not predictable before publication date? the crypto hash takes care of whitening, doesn't it? thanks, mark hahn. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Sat Aug 13 22:22:52 2011 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 13 Aug 2011 22:22:52 -0400 (EDT) Subject: [Beowulf] Memory Testing? In-Reply-To: References: Message-ID: > I'm curious if anyone has any experience with ECC uncorrectable errors > (specifically not the identification of), but which specific dimm in > the chassis it's pointing to. we've had good luck using EDAC to pin down bad dimms - at least those that that cause _correctable_ errors. our uncorrectable errors trigger panics. I suppose that's selectable, though I guess you could turn that off (/sys/module/edac_mc/panic_on_ue) > The mcelog in linux doesn't seem to report the dimm slot correctly on > my supermicro boards. I prefer the hardware-topology-based naming that edac uses (controller, channel, chipselect). I guess recent versions of edac have a user-space tool that will translate that for you (but of course, you have to verify the topo-to-label mapping yourself anyway.) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Sun Aug 14 18:05:31 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 14 Aug 2011 18:05:31 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: On Sat, 13 Aug 2011, Mark Hahn wrote: >>>> After posting I thought of one other source of more or less random >>>> verifiable numbers - the scores of sporting events. ?These are not > > I immediately thought of another widely published stream of immutable noise: > the congressional record. sorry, no smiley ;) > >> Then pop it into your favorite AES-based or threefish based RNG, or cook >> up something yourself with even more rotors, spin it a while, and out >> comes your lottery winner > > sorry, I don't understand your emphasis on flatness. why does the > distribution of the seed (entropy source) matter, as long as it's reasonably > large and not predictable before publication date? > the crypto hash takes care of whitening, doesn't it? Bayes theorem. If one knows that (say) the distribution of digits in sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all the other digits -- because e.g. football games rarely get 4-9 in the second digit slot (note that this is an example only) one can gain a near 2-1 advantage over everybody else playing by picking seeds with the right frequencies and using only those seeds to select a set of numbers, if (as it sounds) there is an openly published unique map between the seed and the lottery outcome so "anybody can check that it is fair". In this latter case you aren't trying to guess the white "random" outcome, you are trying to guess the seed, and if the seed is drawn from a non-flat space you'll beat the pants off of anyone playing blind by using that space to generate your seeds/guesses. Basically you take the lottery from being a lottery with all numbers equally represented in the outcome space to being the moral equivalent of predicting the actual point outcome of N football or basketball games. The size of the latter space is MUCH smaller than the size of all possible scores, right? In fact, it is "small" compared to the latter space. So, sorry, I think that for a lottery (especially one with e.g. a cash payout and deep pocketed people capable of speculatively gambling to win based on expectation value based on an openly published hash and seeing method) needs to use a true random, true white seed, since you might just as well use the seed as the lottery number in this case and in no other case is it fair. Of course, if the lottery is for cakes at a bake sale, who cares. Just don't underestimate the cleverness of would-be attackers if the lottery has an openly published method of generating the result and/or potentially large payout. Plenty of people would tackle the project of cracking the lottery just for the thrill, even if the payout wasn't that great. If the payout was large enough, you'd have have deep-pocketed smart people covering the entire most-likely-point spread generated by Vegas bookies, week after week, through proxies, and making a bundle from it. rgb > > thanks, mark hahn. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Sun Aug 14 22:59:25 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Sun, 14 Aug 2011 19:59:25 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: , Message-ID: Given the discussion about lotteries, etc. This is the classic thing of "numbers games" as run by the mob. You pick a 3 digit number, and the winning number is determined by some readily available public source (stock market, sports games, racetrack winners, etc.). There's probably a fair amount of literature (aside from the works of M. Puzo) describing it. Payoff was something like 600:1 or 750:1, against a nominal 1000:1, so the numbers bank makes their money on the differential (the vig). Just looked up wikipedia.. " later led to the use of the last three numbers in the published daily balance of the United States Treasury." A moderately well known mathematician named Claude Shannon probably analyzed it.. He collaborated with E. Thorpe on some other interesting work on games. ________________________________________ From: beowulf-bounces at beowulf.org [beowulf-bounces at beowulf.org] On Behalf Of Robert G. Brown [rgb at phy.duke.edu] Sent: Sunday, August 14, 2011 15:05 To: Mark Hahn Cc: Beowulf Mailing List Subject: Re: [Beowulf] OT: public random numbers? On Sat, 13 Aug 2011, Mark Hahn wrote: >>>> After posting I thought of one other source of more or less random >>>> verifiable numbers - the scores of sporting events. ?These are not > > I immediately thought of another widely published stream of immutable noise: > the congressional record. sorry, no smiley ;) > >> Then pop it into your favorite AES-based or threefish based RNG, or cook >> up something yourself with even more rotors, spin it a while, and out >> comes your lottery winner > > sorry, I don't understand your emphasis on flatness. why does the > distribution of the seed (entropy source) matter, as long as it's reasonably > large and not predictable before publication date? > the crypto hash takes care of whitening, doesn't it? Bayes theorem. If one knows that (say) the distribution of digits in sports scores is (say, and not unreasonably) 70% 1s, 2s, 3s and 30% all the other digits -- because e.g. football games rarely get 4-9 in the second digit slot (note that this is an example only) one can gain a near 2-1 advantage over everybody else playing by picking seeds with the right frequencies and using only those seeds to select a set of numbers, if (as it sounds) there is an openly published unique map between the seed and the lottery outcome so "anybody can check that it is fair". In this latter case you aren't trying to guess the white "random" outcome, you are trying to guess the seed, and if the seed is drawn from a non-flat space you'll beat the pants off of anyone playing blind by using that space to generate your seeds/guesses. Basically you take the lottery from being a lottery with all numbers equally represented in the outcome space to being the moral equivalent of predicting the actual point outcome of N football or basketball games. The size of the latter space is MUCH smaller than the size of all possible scores, right? In fact, it is "small" compared to the latter space. So, sorry, I think that for a lottery (especially one with e.g. a cash payout and deep pocketed people capable of speculatively gambling to win based on expectation value based on an openly published hash and seeing method) needs to use a true random, true white seed, since you might just as well use the seed as the lottery number in this case and in no other case is it fair. Of course, if the lottery is for cakes at a bake sale, who cares. Just don't underestimate the cleverness of would-be attackers if the lottery has an openly published method of generating the result and/or potentially large payout. Plenty of people would tackle the project of cracking the lottery just for the thrill, even if the payout wasn't that great. If the payout was large enough, you'd have have deep-pocketed smart people covering the entire most-likely-point spread generated by Vegas bookies, week after week, through proxies, and making a bundle from it. rgb > > thanks, mark hahn. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Mon Aug 15 07:57:26 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 15 Aug 2011 07:57:26 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: , Message-ID: On Sun, 14 Aug 2011, Lux, Jim (337C) wrote: > A moderately well known mathematician named Claude Shannon probably > analyzed it.. He collaborated with E. Thorpe on some other interesting > work on games. Shannon? Shannon? The name almost rings a Bell. For your information, I think he's a few bits short of a byte, if you know what I mean. The guy practically Bayes at the moon. Sorry... feeling a bit, well, random this morning. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Mon Aug 15 13:08:59 2011 From: mathog at caltech.edu (David Mathog) Date: Mon, 15 Aug 2011 10:08:59 -0700 Subject: [Beowulf] OT: public random numbers? Message-ID: Leif Nixon wrote: > Have you looked at RFC3797? Not sure if it has any solutions for you, but it > at least discusses the same problems. Good reference, I was not aware of that. It gives the same sorts of sources for random numbers as we have come up with here: stock market, sports, lottery. It discusses how stock market data may not be reliable due to market splits and other accounting issues. However, I have determined that the raw data from the exchanges is a terrible choice because it is not available for free, and the values that are freely available, which are posted on web finance sites, are not reliably identical in all digits. Lottery results are a good source except for the black box / black helicopter factors. We don't generally know where those numbers are coming from, and even in those cases where they do tell us, there is no way to verify that any particular lottery drawing wasn't rigged. We have not discussed election results (votes per candidate), but those are, ironically, really unsuitable for this, even though statistically the final set of digits should have a lot of entropy. Mostly election numbers are a problem because they may be revised for long periods after the election, and the numbers could almost always be forced to shift by a challenge by one of the candidates. Every recount will come up with a slightly different result. Examples: the Coleman vs. Franken senatorial contest in Minnesota, or Bush vs. Gore in Florida. So I'm leaning towards sports scores, as those are generated in full view of a multitude of witnesses (often numbering in the millions). It would be extremely difficult to rig the absolute final score. It might be possible to rig the winner, or even the point spread, but to rig the absolute score in a high scoring game like basketball, would be exceedingly difficult, and would likely be obvious to even the casual observer. To rig every digit in the final score of every game played on a given day should be pretty close to impossible. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From cousins at umit.maine.edu Mon Aug 15 16:59:11 2011 From: cousins at umit.maine.edu (Steve Cousins) Date: Mon, 15 Aug 2011 16:59:11 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: Hi David, Can you give us more information about what you are doing? I'm getting curious about what problem you are working with that requires these conditions. Steve > We have not discussed election results (votes per candidate), but those > are, ironically, really unsuitable for this, even though statistically > the final set of digits should have a lot of entropy. Mostly election > numbers are a problem because they may be revised for long periods after > the election, and the numbers could almost always be forced to shift by > a challenge by one of the candidates. Every recount will come up with a > slightly different result. Examples: the Coleman vs. Franken senatorial > contest in Minnesota, or Bush vs. Gore in Florida. > > So I'm leaning towards sports scores, as those are generated in full > view of a multitude of witnesses (often numbering in the millions). It > would be extremely difficult to rig the absolute final score. It might > be possible to rig the winner, or even the point spread, but to rig the > absolute score in a high scoring game like basketball, would be > exceedingly difficult, and would likely be obvious to even the casual > observer. To rig every digit in the final score of every game played on > a given day should be pretty close to impossible. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From lindahl at pbm.com Wed Aug 17 16:59:58 2011 From: lindahl at pbm.com (Greg Lindahl) Date: Wed, 17 Aug 2011 13:59:58 -0700 Subject: [Beowulf] Fwd: H8DMR-82 ECC error In-Reply-To: References: <201108011637.05934.j.sassmannshausen@ucl.ac.uk> Message-ID: <20110817205958.GB7650@bx9.net> > Memtest was ok, I done 9 cycles without any problems. You should be using the HPL implementation of the Linpack benchmark for testing memory. It exercises all of the memory and all of the cores, and is what most HPC vendors seem to use for node burnin. There's even a bootable DVD with a kernel with enhanced EDAC that was mentioned here a while back. > Hardware Error > CPU0 Machine Check Exception 4 Bank 2 b200200000000863 > TSC 108dd369444 > Processor 2:40f13 Time 1311847912 Socket 0 APIC 0 > MC2-Status: Uncorredted error, report: yes MisV: invalid > CPU context corrupt: yes UECC Error > Bud Unit Error: prefetch/ECC error in data read from NB: local node originated > (SRC) > Transaction type: prefetch (mem access), no timeout, cache level L3/generic. > Participating Processors: local node originated (SRC) And I take it that the location information given here (socket 0, bank 2) isn't useful? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From david.t.kewley at gmail.com Sat Aug 20 16:16:28 2011 From: david.t.kewley at gmail.com (David Kewley) Date: Sat, 20 Aug 2011 13:16:28 -0700 Subject: [Beowulf] Memory Testing? In-Reply-To: References: Message-ID: A few bits from my corner of the experience space: If you have a BMC, 'ipmitool sel list' will probably show the correctable and uncorrectable errors, generally not naming the DIMM involved. But 'ipmitool sel list -v' shows details from various fields in the SEL records. In the ASUS boards I've been playing with lately, the Sensor Number field together with the Event Data field will (usually) tell you the DIMM slot, once you know how to decode those fields for the specific motherboard (and possibly firmware revisions?) that you have. How do you get that motherboard-specific data? By finding a DIMM that reliably produces errors, and moving it from slot to slot, taking notes on those two SEL fields above. I've seen a similar thing work for Dell machines too. If you have Dell PowerEdge R or M boxes (or previous generation equivalents), there are various nicer ways to get the name of the DIMM involved, including using a version of ipmitool that has the 'delloem' subcommand. I second Tony's suggestion that RAM testers may not be as good as real systems, for finding bad RAM. My experience on one large system a few years ago was that new DIMMs failed at a rate of around 1% per year, but "refurbished" DIMMs from RMAs failed at 10% per year (or was it even higher? I forget). I was led to believe that these refurbished DIMMs were often customer returns that had been run through a RAM tester and passed. Turns out sometimes the customers were right and the "refurbishment" process was wrong. One more thing about the ASUS boards I've been playing with lately: If you get a panic on uncorrectable memory error, and power cycle the system (using the power button, or by remote 'ipmitool ... power cycle'), the following POST does not report the bad DIMM. But if you *reset* the system (by pushing the reset button with a paperclip, or by remote 'ipmitool ... power reset'), the next POST will pause and tell you what CPU, Channel, and DIMM was affected on that previous uncorrectable error, which is more info that 'ipmitool sel list' gives you. It's then up to you to figure out how CPU, Channel, and DIMM map to the silkscreened names on the motherboard -- I couldn't find documentation, but it turned out to be the pattern we suspected. :) David -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at mclaren.com Tue Aug 23 11:46:59 2011 From: john.hearns at mclaren.com (Hearns, John) Date: Tue, 23 Aug 2011 16:46:59 +0100 Subject: [Beowulf] Flash storage arrays Message-ID: <207BB2F60743C34496BE41039233A809071F88D8@MRL-PWEXCHMB02.mil.tagmclarengroup.com> Does anyone have an opinion of these for CFD workloads: http://www.theregister.co.uk/2011/08/23/pure_storage_fa_300/ the interesting thing is they claim is is cheaper than disk - but that's a hard claim to assess in an HPC context as it SEEMS to be only when their inbuild deduplication is taken into account. I'm not sure how much dedupe buys you with typical HPC data - ie large files rather than lots of nearly-identical emails or visrtual disk images. John Hearns | CFD Hardware Specialist | McLaren Racing Limited McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK T: +44 (0) 1483 261000 D: +44 (0) 1483 262352 F: +44 (0) 1483 261010 E: john.hearns at mclaren.com W: www.mclaren.com The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Aug 24 21:30:39 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 25 Aug 2011 03:30:39 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: <8BBC31C3-7F1A-433C-863A-5F0EBB4714AC@xs4all.nl> In a world where you don't trust others, using MD5 is out of the question. It's not safe. It's possible to fake a MD5 sum by modifying the number to whatever you wish (if it is enough random data) and then add something, with just a small correction to the data to again get the md5sum that was posted on the website. Vincent On Aug 12, 2011, at 5:21 PM, David Mathog wrote: > Robert G. Brown wrote: > >> Everybody must be able to obtain it >>> freely from a web connection. >> >> http://www.random.org/ >> > > Nice site. They have something that is very close, the pregenerated > random files, from which a small set of digits may be extracted, > and the > files themselves have MD5 checksums (but are not signed). > They also support https. It comes up a little short on criteria 1 (we > really don't know what is going on behind the scenes) and 6 (it is a > single site.) > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Wed Aug 24 21:58:52 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 25 Aug 2011 03:58:52 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: Message-ID: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> On Aug 12, 2011, at 8:35 PM, Robert G. Brown wrote: > On Fri, 12 Aug 2011, David Mathog wrote: > >> Robert G. Brown wrote: >> >>> Everybody must be able to obtain it >>>> freely from a web connection. >>> >>> http://www.random.org/ >>> >> >> Nice site. They have something that is very close, the pregenerated >> random files, from which a small set of digits may be extracted, >> and the >> files themselves have MD5 checksums (but are not signed). >> They also support https. It comes up a little short on criteria 1 >> (we >> really don't know what is going on behind the scenes) and 6 (it is a >> single site.) > > Behind the scenes is documented pretty well on the site, and the > guy who > runs it is a human being, you can communicate with him to learn even > more. I already know him a bit, as he and I have collaborated on > applying dieharder to test random.org datasets -- even "the" > random.org > dataset as of some time ago (I have a few hundred MB of random number > from the site in my dieharder directory). IIRC, the numbers are > generated continuously and fairly slowly by grabbing and filtering and > transforming atmospheric noise. As a source of entropy, that is > probably excellent if (as noted) slow, but many good sources of > entropy > seem to be fairly slow. He has good reason to think that his numbers > are theoretically "true random numbers" Well there is another test i stumbled upon when i did do some analysis on casino (which student who takes himself serious doesn't do an attempt to write some simulations seeing whether you can win something in the casino by designing some strategies?). The simulation revealed it was rather easy to make a fortune with roulette with the doubling system (first put in 1 then if you win, put in 1 else double and keep doublinguntil you win). Reports from guys (some of them missing an eye, another one a hand) who actually study anything trying to make a profit in casino's (and they also really try it in the casino's), revealed that using the doubling system they never saw someone really make big profit with it. So there was a problem between the random generated data versus the true random numbers generated in the casino. Statistical analysis revealed the problem, though not so soon. I noticed that most generated semi-random numbers with software generators, had the habit to truely adress a search space of n always in O (n log n). So if you draw from most software RNG's a number and do it modulo n, with n being not too tiny, say quite some millions or even billions, then every slot in your 'hashtable' will get hit at least once by the RNG, whereas data in reality simply happens to not have that habit simply. So true random numbers versus generated noise is in this manner easy to distinguish by this. Now i didn't study literature whether some other chap some long time ago already had invented this. That would be most interesting to know. In semi pseudo code, let's take an array of size a billion as an example, though usually a few million is more than ok: n = 2^30; // 2 to the power 30 Function TestNumbersForRandomness(RNG,n) { declare array hashtable[size n]; guessednlogn = 2 * (log n / log 2) * n; for( i = 0 ; i < n ; i++ ) hashtable[i] = FALSE; ndraws = filledn = 0; while( ndraws < guessednlogn ) { randomnumber = RNG(); r = randomnumber % n; // randomnumber = r (mod n) if( hashtable[r] == FALSE ) { hashtable[r] = TRUE; filledn++; if( filledn >= n ) break; } ndraws++; } if( filledn >= n ) print "With high degree of certainty data generated by a RNG\n"); else print "Not so sure it's a RNG\n"; } Regards, Vincent > -- both unpredictable and > flat/decorrelated at all orders, and even though there aren't really > enough of them for my purposes, I've used them as one of the (small) > "gold standard" sources for testing dieharder even as I test them. > For > all practical purposes threefish or aes are truly random as well and > they are a lot faster and easier to use as gold standard generators, > though. > > I don't quite understand why the single site restriction is > important -- > this site has been up for years and I don't expect it to go away soon; > it is quite reliable. I don't think there is anything secret about > how > the numbers are generated, and I'll certify that the numbers it > produces > don't make dieharder unhappy. So 1 is fixable with a bit of effort on > your part; 6 I don't really understand but the guy who runs the > site is > clearly willing to construct a custom feed for cash customers, if > there > is enough value in whatever it is you are trying to do to pay for > access. If it's just a lottery, well, lord, I can think of a dozen > ways > to make numbers so random that they'd be unimpeachable for any sort of > lottery, both unpredictable and uncorrelated, and they don't any of > them > require any significant amount of entropy to get started. > > I will add one warning -- "randomness" is a rather stringent > mathematical criterion, and is generally tested against the null > hypothesis. Amateurs who want to make random number generators out of > supposedly "random" data streams or fancy algorithms almost invariably > fail, sometimes spectacularly so. There are a half dozen or more > really, really good pseudorandom number generators out there and it is > easy to hotwire them together into an xor-based high entropy stream > that > basically never repeats (feeding it a bit of real entropy now and then > as it operates). I would strongly counsel you against trying to take > e.g. weather data and make something "random" out of it. Unless you > really know what you are doing, you will probably make something that > isn't at all random and may not even be unpredictable. Even most > sources of "quantum" randomness (which is at least possibly "truly > random", although I doubt it) aren't flat, so that they carry the > signature of their generation process unless/until you manage to > transform them into something flat (difficult unless you KNOW the > distribution they are producing). Pseudorandom number generators have > the serious advantage of being amenable to at least some theoretical > analysis (so you can "guarantee" flatness out to some high > dimensionality, say) as well as empirical testing with e.g. dieharder. > > HTH, > > rgb > >> >> Thanks, >> >> David Mathog >> mathog at caltech.edu >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Thu Aug 25 08:11:07 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 25 Aug 2011 08:11:07 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> Message-ID: On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > I noticed that most generated semi-random numbers with software generators, > had the habit to truely adress a search space of n always in O (n log n). > > So if you draw from most software RNG's a number and do it modulo n, > with n being not too tiny, say quite some millions or even billions, then > every > slot in your 'hashtable' will get hit at least once by the RNG, whereas data > in reality simply happens to not have that habit simply. > > So true random numbers versus generated noise is in this manner easy > to distinguish by this. Now i didn't study literature whether some other chap > some long time ago already had invented this. That would be most interesting > to know. Some other chap named George Marsaglia (and to some extent another chap named Donald Knuth) have already invented this. A number of tests of the tails of random number generators are already in dieharder. All "good" modern rngs pass these tests. The Martingale betting system you are looking at is even older (at least Marsaglia and Knuth are still alive). It dates back to the 18th century, and is well known to be flawed for a variety of reasons, not the least of which is that gamblers don't have the infinite wealth necessary to make this >>even<< a zero-sum strategy and casinos have betting limits that de facto make it impossible to pursue the requisite number of steps and in roulette in particular have 0 and/or 00 slots and aren't zero-sum to begin with. You can read a decent analysis of outcomes based on the presumed binomial distribution of a zero-sum game here: http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 Your test below is interesting, though. The only real problems I can see with actually using it in dieharder are: a) One would need a theoretical estimate of the distribution of filling given n log n draws on an n-slotted table (for largish n). That is, for a perfect rng, what SHOULD the distribution of success/failure be. b) One would then need the CDF for this distribution, to be able to turn the results of N trials (of n log n pulls each) into a p-value under the null hypothesis -- the probability of obtaining the particular number of successes and failures presuming a perfectly random generator. That way dieharder could apply it rigorously to its 70 or 80 embedded rngs or to any user's outboard generator. There probably is theoretical statistical support for the PD and/or CDF -- you're analyzing the tails of a poissonian process -- but finding it or doing it yourself (or myself), aye, that's the rub. One cannot just say "high degree of certainty that it is an RNG" (by which one means that the rng in question fails the test for randomness) in the test. HOW high? Perfect rngs or perfectly random processes will sometimes fill your table, but how often? How can you differentiate an "accident" when one does from an actual failure? All of those questions require a more rigorous theory and quantitative result embedded in a test that can be systematically cranked up to more clearly resolve failures until they are unambiguous, not marginal maybe yes maybe no. I suspect that the failures this test would reveal are already more than covered in dieharder, in particular by the bit distribution tests and the monkey tests, but I'm not terribly happy with the monkey tests and would be perfectly thrilled to have a simpler to compute test that revealed precisely this sort of flaw, systematically. And it doesn't hurt at all to have partially or fully redundant tests as long as the test themselves are rigorously valid. If you can find or compute the CDF for your test below, I'd be happy to wrap it up and add it to dieharder, in other words. One can always SIMULATE a CDF, of course, but that requires a known good generator and sort of begs the question if you don't think that e.g. AES or threefish or KISS are good generators that would actually pass your test. Even hardware/quantum sources of random bits are suspect -- they often are generated by a process that leaves in the traces of an underlying distribution. I'm not convinced that >>any<< process in the real world is >>truly<< random. Physics is ambiguous on the issue -- the quantum description of a closed system is just as deterministic as the classical one, and Master equation unpredictability on open subsets of a large closed system reflects entropy/ignorance, not actual randomness (hence Einstein's famous "doesn't play dice" remark). But lots of this are sufficiently random that one cannot detect any failure of randomness, modern crypto class generators being a prime example. rgb > > In semi pseudo code, let's take an array of size a billion as an example, > though usually a few million is more than ok: > > n = 2^30; // 2 to the power 30 > > Function TestNumbersForRandomness(RNG,n) { > declare array hashtable[size n]; > > guessednlogn = 2 * (log n / log 2) * n; > > for( i = 0 ; i < n ; i++ ) > hashtable[i] = FALSE; > > ndraws = filledn = 0; > while( ndraws < guessednlogn ) { > randomnumber = RNG(); > r = randomnumber % n; // randomnumber = r (mod n) > if( hashtable[r] == FALSE ) { > hashtable[r] = TRUE; > filledn++; > if( filledn >= n ) > break; > > } > ndraws++; > } > > if( filledn >= n ) > print "With high degree of certainty data generated by a RNG\n"); > else > print "Not so sure it's a RNG\n"; > > } > > > > > > Regards, > Vincent > > > > >> -- both unpredictable and >> flat/decorrelated at all orders, and even though there aren't really >> enough of them for my purposes, I've used them as one of the (small) >> "gold standard" sources for testing dieharder even as I test them. For >> all practical purposes threefish or aes are truly random as well and >> they are a lot faster and easier to use as gold standard generators, >> though. >> >> I don't quite understand why the single site restriction is important -- >> this site has been up for years and I don't expect it to go away soon; >> it is quite reliable. I don't think there is anything secret about how >> the numbers are generated, and I'll certify that the numbers it produces >> don't make dieharder unhappy. So 1 is fixable with a bit of effort on >> your part; 6 I don't really understand but the guy who runs the site is >> clearly willing to construct a custom feed for cash customers, if there >> is enough value in whatever it is you are trying to do to pay for >> access. If it's just a lottery, well, lord, I can think of a dozen ways >> to make numbers so random that they'd be unimpeachable for any sort of >> lottery, both unpredictable and uncorrelated, and they don't any of them >> require any significant amount of entropy to get started. >> >> I will add one warning -- "randomness" is a rather stringent >> mathematical criterion, and is generally tested against the null >> hypothesis. Amateurs who want to make random number generators out of >> supposedly "random" data streams or fancy algorithms almost invariably >> fail, sometimes spectacularly so. There are a half dozen or more >> really, really good pseudorandom number generators out there and it is >> easy to hotwire them together into an xor-based high entropy stream that >> basically never repeats (feeding it a bit of real entropy now and then >> as it operates). I would strongly counsel you against trying to take >> e.g. weather data and make something "random" out of it. Unless you >> really know what you are doing, you will probably make something that >> isn't at all random and may not even be unpredictable. Even most >> sources of "quantum" randomness (which is at least possibly "truly >> random", although I doubt it) aren't flat, so that they carry the >> signature of their generation process unless/until you manage to >> transform them into something flat (difficult unless you KNOW the >> distribution they are producing). Pseudorandom number generators have >> the serious advantage of being amenable to at least some theoretical >> analysis (so you can "guarantee" flatness out to some high >> dimensionality, say) as well as empirical testing with e.g. dieharder. >> >> HTH, >> >> rgb >> >>> >>> Thanks, >>> >>> David Mathog >>> mathog at caltech.edu >>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> >> >> Robert G. Brown http://www.phy.duke.edu/~rgb/ >> Duke University Dept. of Physics, Box 90305 >> Durham, N.C. 27708-0305 >> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Aug 25 21:55:04 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 03:55:04 +0200 Subject: [Beowulf] OT: Calculating Extraterrestrial Life - was public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> Message-ID: On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote: > On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > >> I noticed that most generated semi-random numbers with software >> generators, >> had the habit to truely adress a search space of n always in O (n >> log n). >> >> So if you draw from most software RNG's a number and do it modulo n, >> with n being not too tiny, say quite some millions or even >> billions, then every >> slot in your 'hashtable' will get hit at least once by the RNG, >> whereas data >> in reality simply happens to not have that habit simply. >> >> So true random numbers versus generated noise is in this manner easy >> to distinguish by this. Now i didn't study literature whether some >> other chap >> some long time ago already had invented this. That would be most >> interesting >> to know. > > Some other chap named George Marsaglia (and to some extent another > chap > named Donald Knuth) have already invented this. A number of tests of > the tails of random number generators are already in dieharder. All > "good" modern rngs pass these tests. > > The Martingale betting system you are looking at is even older (at > least > Marsaglia and Knuth are still alive). It dates back to the 18th > century, and is well known to be flawed for a variety of reasons, not > the least of which is that gamblers don't have the infinite wealth > necessary to make this >>even<< a zero-sum strategy and casinos have > betting limits that de facto make it impossible to pursue the > requisite > number of steps and in roulette in particular have 0 and/or 00 > slots and > aren't zero-sum to begin with. You can read a decent analysis of > outcomes based on the presumed binomial distribution of a zero-sum > game > here: > > http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 > > Your test below is interesting, though. The only real problems I can > see with actually using it in dieharder are: > > a) One would need a theoretical estimate of the distribution of > filling given n log n draws on an n-slotted table (for largish n). > That > is, for a perfect rng, what SHOULD the distribution of success/failure > be. > > b) One would then need the CDF for this distribution, to be able to > turn the results of N trials (of n log n pulls each) into a p-value > under the null hypothesis -- the probability of obtaining the > particular > number of successes and failures presuming a perfectly random > generator. > > That way dieharder could apply it rigorously to its 70 or 80 embedded > rngs or to any user's outboard generator. There probably is > theoretical > statistical support for the PD and/or CDF -- you're analyzing the > tails > of a poissonian process -- but finding it or doing it yourself (or > myself), aye, that's the rub. One cannot just say "high degree of > certainty that it is an RNG" (by which one means that the rng in > question fails the test for randomness) in the test. HOW high? > Perfect > rngs or perfectly random processes will sometimes fill your table, but > how often? How can you differentiate an "accident" when one does from > an actual failure? All of those questions require a more rigorous > theory and quantitative result embedded in a test that can be > systematically cranked up to more clearly resolve failures until they > are unambiguous, not marginal maybe yes maybe no. > > I suspect that the failures this test would reveal are already more > than > covered in dieharder, in particular by the bit distribution tests and Thanks for your kind words - you'll realize that, seeing all the theories you quote below where i simply have few knowledge from and definitely not the time for to investigate (yet), you're talking way above my level of knowledge there. Instead of going deep into mathematical theories i would find it more appropriate to ponder on the feasability to calculate the existance of extraterrestrial life. Now i realize a lot of efforts go into recognizing messages from outer space. Yet we can also speculate on some things. First of all i'd like to make a statement on extra terrestrial life and a viewpoint there. One viewpoint i've seen promoted is that some scientist(s) claim we should hide ourselves for extraterrestrial life. I fully disagree there to some extend. If there is extraterrestrial life that is more advanced than our society, obviously they also could have build weapons to totally selfdestruct and would already have killed themselves if they would have been agressive forms of life. If they were succesful in reproducing themselves, just like humankind is right now, they would have burned up all resources at their own planet, caused massive extinctions. It is difficult to defend the statement that all mass extinctions were caused by meteorites only - for such statement one would need a proof for every single mass extinction being caused by a specific meteorite; the extinction could just have been that a certain succesful species dominated the planet a tad too much and didn't get clever enough to selfcontrol nor selfregulate to an extend that the planet didn't entirely die. After some millions of years life restores itself then on the planet. So if there would be such an intelligent life elsewhere more advanced than our society is, they sure would want to communicate in a manner that information could get read through different galaxies. However for the most primitive life forms that are succesful to dominate a planet one would want to hide this information for until such society reaches a specific level. One would only want intelligent life to decypher such extraterrestrial form of communication by another extra terrestrial form of life, where the form of life is of a sustainable peaceful level. I would argue such a lifeform would not form a threat to anyone, as they already have proven to not be a threat to their own planet. So if the knowledge of this society is high enough to be able to control all that, one would also be able to argue that belonging to that high level of development, would belong a specific level of math. A level strong enough to decypher the form of communication that gets used to communicate between the different very intelligent lifeforms in existance through the galaxies. From the fact that there is not a systematic form of contact with extraterrestrial life we can already deduce that humankind still has to develop itself further from a species that burns up all its resources, especially causing too much output of CO2 (the latest report i'll have to check out is that the increased CO2 level increases the amount of CO2 absorbed by the oceans causing it to get more sour, causing plankton, start of the foodchain, to not develop its skeleton enough, which for sure in the long run will cause mass extinction). Now we might not be advanced enough yet to decypher extraterrestrial communication, so i wonder whether we might be able to recognize somehow that there is information getting communicated using a form of encryption that we simply cannot decypher yet, based upon comparing it versus how our RNG's work. Some of them run for example over a primefield, others have a distribution too perfect. If we get from space radiation measurements back, and we test them for belonging in a specific class or type of randomness versus non randomness; how does that compare with if we have a source of radiation ourselves that's comparable to that and its randomness classification? Obviously the algorithm i gave is just one specific form of algorithm to measure a perfect distribution - as you already indicated there are many other tests invented already. In how far have those been applied to what could be encrypted communication from extraterrestrial life to other extraterrestrial life (like us if we manage to survive as species and develop further to a peaceful level that can sustain itself for a longer period of time). So summarized what i wonder about is in how random number theory can contribute to detecting extraterrestrial life (of course with a specific statistical significance to it). This of course in combination with experiments conducted that allow us to first classify how a specific form of possible communication system would behave normally spoken according to the randomness classification system, versus the classification on how the measured possible form of communication compares to that. Such classification system would need to be very sophisticated to have any chance of detecing extraterrestrial life i'd guess, as we can't just naively assume that all they could come up with is encrypting things over a primefield using smallish primes which in our world already only is allowed to be used upto secret level. Regards, Vincent > the monkey tests, but I'm not terribly happy with the monkey tests and > would be perfectly thrilled to have a simpler to compute test that > revealed precisely this sort of flaw, systematically. And it doesn't > hurt at all to have partially or fully redundant tests as long as the > test themselves are rigorously valid. If you can find or compute the > CDF for your test below, I'd be happy to wrap it up and add it to > dieharder, in other words. One can always SIMULATE a CDF, of course, > but that requires a known good generator and sort of begs the question > if you don't think that e.g. AES or threefish or KISS are good > generators that would actually pass your test. > > Even hardware/quantum sources of random bits are suspect -- they often > are generated by a process that leaves in the traces of an underlying > distribution. I'm not convinced that >>any<< process in the real > world > is >>truly<< random. Physics is ambiguous on the issue -- the quantum > description of a closed system is just as deterministic as the > classical > one, and Master equation unpredictability on open subsets of a large > closed system reflects entropy/ignorance, not actual randomness (hence > Einstein's famous "doesn't play dice" remark). But lots of this are > sufficiently random that one cannot detect any failure of randomness, > modern crypto class generators being a prime example. > > rgb > >> >> In semi pseudo code, let's take an array of size a billion as an >> example, >> though usually a few million is more than ok: >> >> n = 2^30; // 2 to the power 30 >> >> Function TestNumbersForRandomness(RNG,n) { >> declare array hashtable[size n]; >> >> guessednlogn = 2 * (log n / log 2) * n; >> >> for( i = 0 ; i < n ; i++ ) >> hashtable[i] = FALSE; >> >> ndraws = filledn = 0; >> while( ndraws < guessednlogn ) { >> randomnumber = RNG(); >> r = randomnumber % n; // randomnumber = r (mod n) >> if( hashtable[r] == FALSE ) { >> hashtable[r] = TRUE; >> filledn++; >> if( filledn >= n ) >> break; >> >> } >> ndraws++; >> } >> >> if( filledn >= n ) >> print "With high degree of certainty data generated by a RNG\n"); >> else >> print "Not so sure it's a RNG\n"; >> >> } >> >> >> >> >> >> Regards, >> Vincent >> >> >> >> >>> -- both unpredictable and >>> flat/decorrelated at all orders, and even though there aren't really >>> enough of them for my purposes, I've used them as one of the (small) >>> "gold standard" sources for testing dieharder even as I test >>> them. For >>> all practical purposes threefish or aes are truly random as well and >>> they are a lot faster and easier to use as gold standard generators, >>> though. >>> I don't quite understand why the single site restriction is >>> important -- >>> this site has been up for years and I don't expect it to go away >>> soon; >>> it is quite reliable. I don't think there is anything secret >>> about how >>> the numbers are generated, and I'll certify that the numbers it >>> produces >>> don't make dieharder unhappy. So 1 is fixable with a bit of >>> effort on >>> your part; 6 I don't really understand but the guy who runs the >>> site is >>> clearly willing to construct a custom feed for cash customers, if >>> there >>> is enough value in whatever it is you are trying to do to pay for >>> access. If it's just a lottery, well, lord, I can think of a >>> dozen ways >>> to make numbers so random that they'd be unimpeachable for any >>> sort of >>> lottery, both unpredictable and uncorrelated, and they don't any >>> of them >>> require any significant amount of entropy to get started. >>> I will add one warning -- "randomness" is a rather stringent >>> mathematical criterion, and is generally tested against the null >>> hypothesis. Amateurs who want to make random number generators >>> out of >>> supposedly "random" data streams or fancy algorithms almost >>> invariably >>> fail, sometimes spectacularly so. There are a half dozen or more >>> really, really good pseudorandom number generators out there and >>> it is >>> easy to hotwire them together into an xor-based high entropy >>> stream that >>> basically never repeats (feeding it a bit of real entropy now and >>> then >>> as it operates). I would strongly counsel you against trying to >>> take >>> e.g. weather data and make something "random" out of it. Unless you >>> really know what you are doing, you will probably make something >>> that >>> isn't at all random and may not even be unpredictable. Even most >>> sources of "quantum" randomness (which is at least possibly "truly >>> random", although I doubt it) aren't flat, so that they carry the >>> signature of their generation process unless/until you manage to >>> transform them into something flat (difficult unless you KNOW the >>> distribution they are producing). Pseudorandom number generators >>> have >>> the serious advantage of being amenable to at least some theoretical >>> analysis (so you can "guarantee" flatness out to some high >>> dimensionality, say) as well as empirical testing with e.g. >>> dieharder. >>> HTH, >>> >>> rgb >>>> Thanks, >>>> David Mathog >>>> mathog at caltech.edu >>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Thu Aug 25 20:27:18 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 02:27:18 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> Message-ID: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote: > On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > >> I noticed that most generated semi-random numbers with software >> generators, >> had the habit to truely adress a search space of n always in O (n >> log n). >> >> So if you draw from most software RNG's a number and do it modulo n, >> with n being not too tiny, say quite some millions or even >> billions, then every >> slot in your 'hashtable' will get hit at least once by the RNG, >> whereas data >> in reality simply happens to not have that habit simply. >> >> So true random numbers versus generated noise is in this manner easy >> to distinguish by this. Now i didn't study literature whether some >> other chap >> some long time ago already had invented this. That would be most >> interesting >> to know. > > Some other chap named George Marsaglia (and to some extent another > chap > named Donald Knuth) have already invented this. A number of tests of > the tails of random number generators are already in dieharder. All > "good" modern rngs pass these tests. > > The Martingale betting system you are looking at is even older (at > least > Marsaglia and Knuth are still alive). It dates back to the 18th > century, and is well known to be flawed for a variety of reasons, not > the least of which is that gamblers don't have the infinite wealth > necessary to make this >>even<< a zero-sum strategy and casinos have From mathematical viewpoint it makes perfect cash. As statistica odds is you already have build up considerable profit when a worst case (that you hit the 10 times practical double limit) hits you. The simulations are of course using the practical limit. Note that the European casino's have a single zero. In USA there is even more greedy mafia controlling all the casino's, there are 2 zero's there. 0 and 00. The simulations were for European casino's. > betting limits that de facto make it impossible to pursue the > requisite > number of steps and in roulette in particular have 0 and/or 00 > slots and > aren't zero-sum to begin with. You can read a decent analysis of > outcomes based on the presumed binomial distribution of a zero-sum > game > here: > > http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 > You're not allowed to use a system in a casino, so we speak about theory. Probably first evening they let you try. Second day you'll get on the blacklist. > Your test below is interesting, though. The only real problems I can > see with actually using it in dieharder are: > Yeah more interesting than the billion times discussed roulette system which has been analyzed completely flat. > a) One would need a theoretical estimate of the distribution of > filling given n log n draws on an n-slotted table (for largish n). > That > is, for a perfect rng, what SHOULD the distribution of success/failure > be. As we figured out by now in Artificial Intelligence the statistical assumptions made in the past they simply do not hold. For Artificial Intelligence we need a new sort of theoretical theory. As for the distribution problem, generatiors having a spread that's too accurate, the way to deliver a proof would be for example build a simple device. Build an old fashioned box where you can draw balls. Remember what you coud see on TV some 20 years ago or so (not sure it was like that in USA). A big basked with balls. The basket, in fact it's looking like this: http://www.rateyours.com/blog/uploaded_images/lottery_machine-727064.jpg But now a much bigger machine like this with inside different means of randomizing the balls, actually also randomly modifying the inside obstacles of shaking of the balls. After a ball has been drawn you automatically have it annotated and the ball immediately goes back into the machine. For a full minute you have the balls in the machine shaken again and you draw again a ball. It is important to do this randomizing of the balls inside the machine for quite some time. I would propose a minute. Of course you have to do this with quite some balls. Say a thousand. Then you draw balls until all numbers have been drawn at least once. This cool experiment can be easily build. Of course the expected running time of a single experiment will be a few weeks. You can produce a number of those drawing machines though and have a look. Theories that seemingly work for small n, n being the number of balls, are much harder to maintain at bigger n's, as we also see in prime number research. The way how the machine gets designed of course is total crucial. I would propose a design that really shakes the balls really a lot through each other and really very thoroughly. Just like we nowadays know how flawed a big number of card shaking machines are that are popular to use. Such a lottery with realy a lot of balls would be very interesting to see the outcomes from. In fact i would prefer having produced number of those machines, so that it's possible to really have a lot of outcomes and then analyze them very well. > > b) One would then need the CDF for this distribution, to be able to > turn the results of N trials (of n log n pulls each) into a p-value > under the null hypothesis -- the probability of obtaining the > particular > number of successes and failures presuming a perfectly random > generator. > > That way dieharder could apply it rigorously to its 70 or 80 embedded > rngs or to any user's outboard generator. There probably is > theoretical > statistical support for the PD and/or CDF -- you're analyzing the > tails > of a poissonian process -- but finding it or doing it yourself (or > myself), aye, that's the rub. One cannot just say "high degree of > certainty that it is an RNG" (by which one means that the rng in > question fails the test for randomness) in the test. HOW high? > Perfect > rngs or perfectly random processes will sometimes fill your table, but > how often? If we assume that reality of life represents randomness, which is another rather good question in how far that theory is plausible, then using that assumption i'm very sure that the RNG's i investigated so far have a distribution which is too perfect, more perfect than i have seen in any reality. In fact most RNG's fill all slots faster than O ( n log n ), yet it's O ( n log n ) that they follow. This is RNG's that have come through all tests as being a good and very acceptabe RNG to be used. Realize i'm no RNG expert, so all the names of all those tests. For me it's just push button technology. I just designed a test and found it very odd that all RNG's have such perfect distributions that they don't even miss a single slot. I'd argue the only test that would be interesting to me to see how it might be in reality is the lottery machine test - yet with really a lot of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical reasons i would agree with a number of above a 1000. Paper fiddling is really not interesting to me there to prove anything, as what i've seen in reality in randomness is total different from how RNG's model that. Regards, Vincent > How can you differentiate an "accident" when one does from > an actual failure? All of those questions require a more rigorous > theory and quantitative result embedded in a test that can be > systematically cranked up to more clearly resolve failures until they > are unambiguous, not marginal maybe yes maybe no. > > I suspect that the failures this test would reveal are already more > than > covered in dieharder, in particular by the bit distribution tests and > the monkey tests, but I'm not terribly happy with the monkey tests and > would be perfectly thrilled to have a simpler to compute test that > revealed precisely this sort of flaw, systematically. And it doesn't > hurt at all to have partially or fully redundant tests as long as the > test themselves are rigorously valid. If you can find or compute the > CDF for your test below, I'd be happy to wrap it up and add it to > dieharder, in other words. One can always SIMULATE a CDF, of course, > but that requires a known good generator and sort of begs the question > if you don't think that e.g. AES or threefish or KISS are good > generators that would actually pass your test. > > Even hardware/quantum sources of random bits are suspect -- they often > are generated by a process that leaves in the traces of an underlying > distribution. I'm not convinced that >>any<< process in the real > world > is >>truly<< random. Physics is ambiguous on the issue -- the quantum > description of a closed system is just as deterministic as the > classical > one, and Master equation unpredictability on open subsets of a large > closed system reflects entropy/ignorance, not actual randomness (hence > Einstein's famous "doesn't play dice" remark). But lots of this are > sufficiently random that one cannot detect any failure of randomness, > modern crypto class generators being a prime example. > > rgb > >> >> In semi pseudo code, let's take an array of size a billion as an >> example, >> though usually a few million is more than ok: >> >> n = 2^30; // 2 to the power 30 >> >> Function TestNumbersForRandomness(RNG,n) { >> declare array hashtable[size n]; >> >> guessednlogn = 2 * (log n / log 2) * n; >> >> for( i = 0 ; i < n ; i++ ) >> hashtable[i] = FALSE; >> >> ndraws = filledn = 0; >> while( ndraws < guessednlogn ) { >> randomnumber = RNG(); >> r = randomnumber % n; // randomnumber = r (mod n) >> if( hashtable[r] == FALSE ) { >> hashtable[r] = TRUE; >> filledn++; >> if( filledn >= n ) >> break; >> >> } >> ndraws++; >> } >> >> if( filledn >= n ) >> print "With high degree of certainty data generated by a RNG\n"); >> else >> print "Not so sure it's a RNG\n"; >> >> } >> >> >> >> >> >> Regards, >> Vincent >> >> >> >> >>> -- both unpredictable and >>> flat/decorrelated at all orders, and even though there aren't really >>> enough of them for my purposes, I've used them as one of the (small) >>> "gold standard" sources for testing dieharder even as I test >>> them. For >>> all practical purposes threefish or aes are truly random as well and >>> they are a lot faster and easier to use as gold standard generators, >>> though. >>> I don't quite understand why the single site restriction is >>> important -- >>> this site has been up for years and I don't expect it to go away >>> soon; >>> it is quite reliable. I don't think there is anything secret >>> about how >>> the numbers are generated, and I'll certify that the numbers it >>> produces >>> don't make dieharder unhappy. So 1 is fixable with a bit of >>> effort on >>> your part; 6 I don't really understand but the guy who runs the >>> site is >>> clearly willing to construct a custom feed for cash customers, if >>> there >>> is enough value in whatever it is you are trying to do to pay for >>> access. If it's just a lottery, well, lord, I can think of a >>> dozen ways >>> to make numbers so random that they'd be unimpeachable for any >>> sort of >>> lottery, both unpredictable and uncorrelated, and they don't any >>> of them >>> require any significant amount of entropy to get started. >>> I will add one warning -- "randomness" is a rather stringent >>> mathematical criterion, and is generally tested against the null >>> hypothesis. Amateurs who want to make random number generators >>> out of >>> supposedly "random" data streams or fancy algorithms almost >>> invariably >>> fail, sometimes spectacularly so. There are a half dozen or more >>> really, really good pseudorandom number generators out there and >>> it is >>> easy to hotwire them together into an xor-based high entropy >>> stream that >>> basically never repeats (feeding it a bit of real entropy now and >>> then >>> as it operates). I would strongly counsel you against trying to >>> take >>> e.g. weather data and make something "random" out of it. Unless you >>> really know what you are doing, you will probably make something >>> that >>> isn't at all random and may not even be unpredictable. Even most >>> sources of "quantum" randomness (which is at least possibly "truly >>> random", although I doubt it) aren't flat, so that they carry the >>> signature of their generation process unless/until you manage to >>> transform them into something flat (difficult unless you KNOW the >>> distribution they are producing). Pseudorandom number generators >>> have >>> the serious advantage of being amenable to at least some theoretical >>> analysis (so you can "guarantee" flatness out to some high >>> dimensionality, say) as well as empirical testing with e.g. >>> dieharder. >>> HTH, >>> >>> rgb >>>> Thanks, >>>> David Mathog >>>> mathog at caltech.edu >>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 02:07:17 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 02:07:17 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > If we assume that reality of life represents randomness, which is another > rather good question in how far that theory is plausible, then using that > assumption i'm very sure that the RNG's i investigated so far > have a distribution which is too perfect, more perfect than i have seen > in any reality. That's because you live in a different reality than everybody else, Vincent. > In fact most RNG's fill all slots faster than O ( n log n ), yet it's O ( n > log n ) > that they follow. In fact, they don't. > This is RNG's that have come through all tests as being a good and > very acceptabe RNG to be used. No, it's not. > Realize i'm no RNG expert, so all the names of all those tests. > > For me it's just push button technology. I just designed a test > and found it very odd that all RNG's have such perfect distributions > that they don't even miss a single slot. It's odd because your test is broken. > > I'd argue the only test that would be interesting to me to see how it > might be in reality is the lottery machine test - yet with really a lot > of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical > reasons i would agree with a number of above a 1000. > > Paper fiddling is really not interesting to me there to prove anything, > as what i've seen in reality in randomness is total different from how > RNG's model that. Let's try a bit of "paper fiddling". The expected number of filled slots is (this is actual code, not pseudocode, for n slots): nlogn = log10(n)*n; expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); The reasoning is enormously simple. The probability of a slot being empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - 1/n)^nlogn. The probability of a slot being filled is thus 1 - p_e, and given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within random noise, n*(1-1/n)^nlogn of them "should" be empty. Well, I've got a random number generator tester harness, so I hacked your test into it. One major bug in your code, BTW, is using a modulus to generate your random numbers -- dunno what that's about, but if your rng returned numbers between (say) 0 and 7 and you use it to generate numbers in the range 0 to 5 by means of r%5 then you'll get (for the sequence of numbers) 0 1 2 3 4 0 1 2. Note well that you get twice as many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7. So you aren't even testing a uniformly distributed sequence of integers. Fixing this relatively minor bug, removing your breakout and actually counting up filledn for the full nlogn samples, and applying the test to mt19937, we get: rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head n = 10000000 nlogn = 70000000 table not all filled: filledn = 9990811, expected = 9990881 We run it again: rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head n = 10000000 nlogn = 70000000 table not all filled: filledn = 9990802, expected = 9990881 We run it for R250 -- a well-known not-good generator: rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head n = 10000000 nlogn = 70000000 table not all filled: filledn = 9990794, expected = 9990881 We run it on the literally infamous randu: n = 10000000 nlogn = 70000000 table not all filled: filledn = 9999482, expected = 9990881 Note, Vincent, that the last two examples of correctly computed results from known-terrible generators are much farther from the expected mean than mt19937, a well-known damn good one. This suggests that your test (perhaps unsurprisingly) has some sensitivity, not because some slots are or aren't empty, but because the NUMBER of slots that are or aren't empty isn't quite correct. Note also that in the "paper fiddling" analysis above, the use of nlogn is quite unimportant -- we could make this an independent variable and evaluate the table filling for any value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m. If I have the energy, I'll see if the distribution of filledn around expected is e.g. Gaussian -- it seems pretty reasonable that it would be -- with some expected or empirically computable variance. If it is, then this can be fairly easily turned into an actual test that returns a p-value that humans can use to make rational judgements, or rational humans can use to make judgements or something like that. I doubt that the test will have MUCH sensitivity -- modern generators are way too good to have their flaws picked out quite this simply, although Marsaglia's "monkey tests" do something very similar although a lot more sophisticated mathematically (and arguably more sensitive) and do suffice to nail randu (anything nails randu) and semi-weak tests like R250. Now, let's see what we've learned from this fiddling. One is that without it, you just waste a lot of people's time making egregious and false claims that belittle the tremendously sophisticated and difficult work a whole lot of "fiddlers" have put into inventing, writing, and testing modern RNGs. The truth is that >>all<< RNGs in dieharder "pass" your test (if the test is "producing at least one zero") once your test isn't broken. We've learned that in fact, the best of the modern RNGs are damn good, and that you could work for five years trying to invent a test that is good enough to fail any of them and still not succeed. Finally, we've learned that you should not, not, not take your Martingale to a casino and try the doubling strategy out to make money, or if you do put a firm upper bound -- something like 63 Euro -- on what you're willing to lose with your base stake of 1 Euro. That way you have maybe a 40% chance of doubling your 63 Euro before you go broke. Really, you should read the Wikipedia article I linked, in spite of the fact that it presents more "paper fiddling". Sincerely, rgb (See P.S. comments below...) >>> n = 2^30; // 2 to the power 30 >>> >>> Function TestNumbersForRandomness(RNG,n) { >>> declare array hashtable[size n]; >>> >>> guessednlogn = 2 * (log n / log 2) * n; Why guess nlogn? nlog is n*log10(n). Why nlogn anyway? Call it m and make it a parameter. >>> for( i = 0 ; i < n ; i++ ) >>> hashtable[i] = FALSE; >>> >>> ndraws = filledn = 0; >>> while( ndraws < guessednlogn ) { >>> randomnumber = RNG(); >>> r = randomnumber % n; // randomnumber = r (mod n) no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX. Yes there are roundoff errors, but they are uniform and consistent and as you can see, don't affect this problem. What you have isn't even close to uniform -- it is badly nonrandom. >>> if( hashtable[r] == FALSE ) { >>> hashtable[r] = TRUE; >>> filledn++; >>> if( filledn >= n ) >>> break; Don't break. Just count up filledn. It will never be more than n now anyway, for any n, or any reasonable m. There probably is some number of pulls that will raise "expected" to n, but it is pretty big compared to n, way bigger than nlogn. >>> >>> } >>> ndraws++; >>> } >>> >>> if( filledn >= n ) >>> print "With high degree of certainty data generated by a RNG\n"); >>> else >>> print "Not so sure it's a RNG\n"; >>> >>> } I'm guessing the correct statistic here is something like |expected - filledn|/expected, but as I said, I haven't really worked at it. I haven't decided whether or not it is worth adding this to dieharder -- without a formal derivation of the expected statistic it would be yet another empirical test, which means you're really comparing one RNG to another presumed better one, which I don't like. And do I have time to do the "fiddling" needed to do a proper derivation? Aye, that's the rub...;-) rgb >>> >>> >>> >>> >>> >>> Regards, >>> Vincent >>> >>> >>> >>> >>>> -- both unpredictable and >>>> flat/decorrelated at all orders, and even though there aren't really >>>> enough of them for my purposes, I've used them as one of the (small) >>>> "gold standard" sources for testing dieharder even as I test them. For >>>> all practical purposes threefish or aes are truly random as well and >>>> they are a lot faster and easier to use as gold standard generators, >>>> though. >>>> I don't quite understand why the single site restriction is important -- >>>> this site has been up for years and I don't expect it to go away soon; >>>> it is quite reliable. I don't think there is anything secret about how >>>> the numbers are generated, and I'll certify that the numbers it produces >>>> don't make dieharder unhappy. So 1 is fixable with a bit of effort on >>>> your part; 6 I don't really understand but the guy who runs the site is >>>> clearly willing to construct a custom feed for cash customers, if there >>>> is enough value in whatever it is you are trying to do to pay for >>>> access. If it's just a lottery, well, lord, I can think of a dozen ways >>>> to make numbers so random that they'd be unimpeachable for any sort of >>>> lottery, both unpredictable and uncorrelated, and they don't any of them >>>> require any significant amount of entropy to get started. >>>> I will add one warning -- "randomness" is a rather stringent >>>> mathematical criterion, and is generally tested against the null >>>> hypothesis. Amateurs who want to make random number generators out of >>>> supposedly "random" data streams or fancy algorithms almost invariably >>>> fail, sometimes spectacularly so. There are a half dozen or more >>>> really, really good pseudorandom number generators out there and it is >>>> easy to hotwire them together into an xor-based high entropy stream that >>>> basically never repeats (feeding it a bit of real entropy now and then >>>> as it operates). I would strongly counsel you against trying to take >>>> e.g. weather data and make something "random" out of it. Unless you >>>> really know what you are doing, you will probably make something that >>>> isn't at all random and may not even be unpredictable. Even most >>>> sources of "quantum" randomness (which is at least possibly "truly >>>> random", although I doubt it) aren't flat, so that they carry the >>>> signature of their generation process unless/until you manage to >>>> transform them into something flat (difficult unless you KNOW the >>>> distribution they are producing). Pseudorandom number generators have >>>> the serious advantage of being amenable to at least some theoretical >>>> analysis (so you can "guarantee" flatness out to some high >>>> dimensionality, say) as well as empirical testing with e.g. dieharder. >>>> HTH, >>>> >>>> rgb >>>>> Thanks, >>>>> David Mathog >>>>> mathog at caltech.edu >>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>>> Duke University Dept. of Physics, Box 90305 >>>> Durham, N.C. 27708-0305 >>>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> Robert G. Brown http://www.phy.duke.edu/~rgb/ >> Duke University Dept. of Physics, Box 90305 >> Durham, N.C. 27708-0305 >> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >> > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Aug 26 07:56:15 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 13:56:15 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Aug 26, 2011, at 8:07 AM, Robert G. Brown wrote: > On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > >> If we assume that reality of life represents randomness, which is >> another >> rather good question in how far that theory is plausible, then >> using that >> assumption i'm very sure that the RNG's i investigated so far >> have a distribution which is too perfect, more perfect than i have >> seen >> in any reality. > > That's because you live in a different reality than everybody else, > Vincent. Or reality we live in might not be so random as we all guess... But it's good that you took a look at the die-harder test now - which you didn't do before. > >> In fact most RNG's fill all slots faster than O ( n log n ), yet >> it's O ( n log n ) >> that they follow. > > In fact, they don't. > >> This is RNG's that have come through all tests as being a good and >> very acceptabe RNG to be used. > > No, it's not. > >> Realize i'm no RNG expert, so all the names of all those tests. >> >> For me it's just push button technology. I just designed a test >> and found it very odd that all RNG's have such perfect distributions >> that they don't even miss a single slot. > > It's odd because your test is broken. > >> >> I'd argue the only test that would be interesting to me to see how it >> might be in reality is the lottery machine test - yet with really >> a lot >> of balls. I'd prefer 10k balls over a 1000 in fact - yet for >> practical >> reasons i would agree with a number of above a 1000. >> >> Paper fiddling is really not interesting to me there to prove >> anything, >> as what i've seen in reality in randomness is total different from >> how >> RNG's model that. > > Let's try a bit of "paper fiddling". The expected number of filled > slots > is (this is actual code, not pseudocode, for n slots): > > nlogn = log10(n)*n; > expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); > > The reasoning is enormously simple. The probability of a slot being > empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - > 1/n)^nlogn. The probability of a slot being filled is thus 1 - > p_e, and > given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within > random noise, n*(1-1/n)^nlogn of them "should" be empty. > > Well, I've got a random number generator tester harness, so I hacked > your test into it. One major bug in your code, BTW, is using a modulus > to generate your random numbers -- dunno what that's about, but if > your EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM. Apologies for the caps. I hope how important this is. You're claiming all programmers use random numbers in a faulty manner? This is important enough to further discuss about it. As nearly always you need random numbers from within a given domain say 0.. n-1 So projecting a RNG onto that domain is pretty crucial. How would you want to do that in a correct manner? In the slot test in fact a simple AND is enough. > rng returned numbers between (say) 0 and 7 and you use it to generate > numbers in the range 0 to 5 by means of r%5 then you'll get (for the > sequence of numbers) 0 1 2 3 4 0 1 2. Note well that you get twice as > many 0's, 1's and 2's as 3's and 4's assuming a random draw on 0 to 7. > So you aren't even testing a uniformly distributed sequence of > integers. > > Fixing this relatively minor bug, removing your breakout and actually > counting up filledn for the full nlogn samples, and applying the > test to > mt19937, we get: > > rgb at lilith|B:1009>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9990811, expected = 9990881 > > We run it again: > rgb at lilith|B:1010>./dieharder -r 6 -n 10000000 -p 1 -t 1 | head > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9990802, expected = 9990881 > > We run it for R250 -- a well-known not-good generator: > rgb at lilith|B:1012>./dieharder -g 16 -r 6 -n 10000000 -p 1 -t 1 | head > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9990794, expected = 9990881 > > We run it on the literally infamous randu: > n = 10000000 > nlogn = 70000000 > table not all filled: filledn = 9999482, expected = 9990881 > > Note, Vincent, that the last two examples of correctly computed > results > from known-terrible generators are much farther from the expected mean > than mt19937, a well-known damn good one. This suggests that your > test > (perhaps unsurprisingly) has some sensitivity, not because some slots > are or aren't empty, but because the NUMBER of slots that are or > aren't > empty isn't quite correct. Note also that in the "paper fiddling" > analysis above, the use of nlogn is quite unimportant -- we could make > this an independent variable and evaluate the table filling for any > value m of pulls, as long as our expected value is n - n*(1 - 1/n)^m. > > If I have the energy, I'll see if the distribution of filledn around > expected is e.g. Gaussian -- it seems pretty reasonable that it > would be > -- with some expected or empirically computable variance. If it is, > then this can be fairly easily turned into an actual test that > returns a > p-value that humans can use to make rational judgements, or rational > humans can use to make judgements or something like that. I doubt > that > the test will have MUCH sensitivity -- modern generators are way too > good to have their flaws picked out quite this simply, although > Marsaglia's "monkey tests" do something very similar although a lot > more > sophisticated mathematically (and arguably more sensitive) and do > suffice to nail randu (anything nails randu) and semi-weak tests like > R250. > > Now, let's see what we've learned from this fiddling. One is that > without it, you just waste a lot of people's time making egregious and > false claims that belittle the tremendously sophisticated and > difficult > work a whole lot of "fiddlers" have put into inventing, writing, and > testing modern RNGs. The truth is that >>all<< RNGs in dieharder > "pass" > your test (if the test is "producing at least one zero") once your > test > isn't broken. We've learned that in fact, the best of the modern RNGs > are damn good, and that you could work for five years trying to > invent a > test that is good enough to fail any of them and still not succeed. > Finally, we've learned that you should not, not, not take your > Martingale to a casino and try the doubling strategy out to make > money, It's not interesting to discuss - but yes this strategy makes money in casino's, you just get thrown out of the casino and end up at the blacklist if you do. For good chessplayers all this is not so tough. The casino's blacklist of people too strong in blackjack is endless... ...this is practice for long than we live now... So Casino reality is much simpler. They kick you out if you're good. That's why they try to popularize poker now - you don't play against the casino there. > or if you do put a firm upper bound -- something like 63 Euro -- on > what > you're willing to lose with your base stake of 1 Euro. That way you > have maybe a 40% chance of doubling your 63 Euro before you go broke. > Really, you should read the Wikipedia article I linked, in spite of > the > fact that it presents more "paper fiddling". > > Sincerely, > > rgb > > (See P.S. comments below...) > >>>> n = 2^30; // 2 to the power 30 >>>> Function TestNumbersForRandomness(RNG,n) { >>>> declare array hashtable[size n]; >>>> guessednlogn = 2 * (log n / log 2) * n; > > Why guess nlogn? nlog is n*log10(n). Why nlogn anyway? Call it m > and > make it a parameter. > >>>> for( i = 0 ; i < n ; i++ ) >>>> hashtable[i] = FALSE; >>>> ndraws = filledn = 0; >>>> while( ndraws < guessednlogn ) { >>>> randomnumber = RNG(); >>>> r = randomnumber % n; // randomnumber = r (mod n) > > no, r = n*RNG_UNIFORM(); where RNG_UNIFORM() is e.g. RNG/UINT_MAX. > Yes > there are roundoff errors, but they are uniform and consistent and as > you can see, don't affect this problem. What you have isn't even > close > to uniform -- it is badly nonrandom. > >>>> if( hashtable[r] == FALSE ) { >>>> hashtable[r] = TRUE; >>>> filledn++; > >>>> if( filledn >= n ) >>>> break; > > Don't break. Just count up filledn. It will never be more than n now > anyway, for any n, or any reasonable m. There probably is some > number of > pulls that will raise "expected" to n, but it is pretty big > compared to > n, way bigger than nlogn. > >>>> >>>> } >>>> ndraws++; >>>> } >>>> if( filledn >= n ) >>>> print "With high degree of certainty data generated by a RNG\n"); >>>> else >>>> print "Not so sure it's a RNG\n"; >>>> } > > I'm guessing the correct statistic here is something like |expected - > filledn|/expected, but as I said, I haven't really worked at it. I > haven't decided whether or not it is worth adding this to dieharder -- > without a formal derivation of the expected statistic it would be yet > another empirical test, which means you're really comparing one RNG to > another presumed better one, which I don't like. And do I have > time to > do the "fiddling" needed to do a proper derivation? Aye, that's the > rub...;-) > > rgb > >>>> Regards, >>>> Vincent >>>>> -- both unpredictable and >>>>> flat/decorrelated at all orders, and even though there aren't >>>>> really >>>>> enough of them for my purposes, I've used them as one of the >>>>> (small) >>>>> "gold standard" sources for testing dieharder even as I test >>>>> them. For >>>>> all practical purposes threefish or aes are truly random as >>>>> well and >>>>> they are a lot faster and easier to use as gold standard >>>>> generators, >>>>> though. >>>>> I don't quite understand why the single site restriction is >>>>> important -- >>>>> this site has been up for years and I don't expect it to go >>>>> away soon; >>>>> it is quite reliable. I don't think there is anything secret >>>>> about how >>>>> the numbers are generated, and I'll certify that the numbers it >>>>> produces >>>>> don't make dieharder unhappy. So 1 is fixable with a bit of >>>>> effort on >>>>> your part; 6 I don't really understand but the guy who runs the >>>>> site is >>>>> clearly willing to construct a custom feed for cash customers, >>>>> if there >>>>> is enough value in whatever it is you are trying to do to pay for >>>>> access. If it's just a lottery, well, lord, I can think of a >>>>> dozen ways >>>>> to make numbers so random that they'd be unimpeachable for any >>>>> sort of >>>>> lottery, both unpredictable and uncorrelated, and they don't >>>>> any of them >>>>> require any significant amount of entropy to get started. >>>>> I will add one warning -- "randomness" is a rather stringent >>>>> mathematical criterion, and is generally tested against the null >>>>> hypothesis. Amateurs who want to make random number generators >>>>> out of >>>>> supposedly "random" data streams or fancy algorithms almost >>>>> invariably >>>>> fail, sometimes spectacularly so. There are a half dozen or more >>>>> really, really good pseudorandom number generators out there >>>>> and it is >>>>> easy to hotwire them together into an xor-based high entropy >>>>> stream that >>>>> basically never repeats (feeding it a bit of real entropy now >>>>> and then >>>>> as it operates). I would strongly counsel you against trying >>>>> to take >>>>> e.g. weather data and make something "random" out of it. >>>>> Unless you >>>>> really know what you are doing, you will probably make >>>>> something that >>>>> isn't at all random and may not even be unpredictable. Even most >>>>> sources of "quantum" randomness (which is at least possibly "truly >>>>> random", although I doubt it) aren't flat, so that they carry the >>>>> signature of their generation process unless/until you manage to >>>>> transform them into something flat (difficult unless you KNOW the >>>>> distribution they are producing). Pseudorandom number >>>>> generators have >>>>> the serious advantage of being amenable to at least some >>>>> theoretical >>>>> analysis (so you can "guarantee" flatness out to some high >>>>> dimensionality, say) as well as empirical testing with e.g. >>>>> dieharder. >>>>> HTH, >>>>> >>>>> rgb >>>>>> Thanks, >>>>>> David Mathog >>>>>> mathog at caltech.edu >>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>>>> Robert G. Brown http://www.phy.duke.edu/ >>>>> ~rgb/ >>>>> Duke University Dept. of Physics, Box 90305 >>>>> Durham, N.C. 27708-0305 >>>>> Phone: 1-919-660-2567 Fax: 919-660-2525 >>>>> email:rgb at phy.duke.edu >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>>>> Computing >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> Robert G. Brown http://www.phy.duke.edu/~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >> > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 08:29:06 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 08:29:06 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Robert G. Brown wrote: > Let's try a bit of "paper fiddling". The expected number of filled slots > is (this is actual code, not pseudocode, for n slots): > > nlogn = log10(n)*n; > expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); > > The reasoning is enormously simple. The probability of a slot being > empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - > 1/n)^nlogn. The probability of a slot being filled is thus 1 - p_e, and > given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within > random noise, n*(1-1/n)^nlogn of them "should" be empty. Silly me. All of the anonymous slots are at least asymptotically independent (not necessarily obvious, but true from symmetry, I think, subject to the weak constraint that the total population of all of the slots has to add up to the number of trials so there are probably n-1 degrees of freedom in the Pearson test). We have p and q. The distribution is binomial and of course I know the binomial distribution and its sigma. I can easily build any one of several tests on top of this (simple binomial or even multinomial, since I effectively have the hit frequency for n slots and it should BE the binomial distribution), and in fact have two or three already that are very similar to this on a smaller scale. It's what comes of hacking out -- sorry, "fiddling" out -- quick solutions and tests late at night when you're tired and ought to be sleeping. A bit of coffee makes a world of difference...:-) I'll have to think a bit about it and make sure that this isn't already done, better, in e.g. STS, but it might yet see the light of day as an actual dieharder test. BTW, I'm not replying to your space alien ET post (to the Beowulf list in reply to an already OT discussion of martingales that arose out of a discussion of good RNGs and seeding strategies sorry y'all but hey, at least it is entertaining?) simply because my jaw is sore from hitting the ground so many times while reading it. Those are some top-quality hallucinogens, yes they are... We will now return to your regularly scheduled discussion of boring things like bandwidth, memory reliability, parallel algorithms and the like, you know, on-topic stuff. But if any of y'all ever need to test rngs or flame schemes to "win" non-zero-sum games by means of "strategy", you know who to call...;-) Somewhere upstairs I have this nifty book on game theory and in a pinch I can even trot out an actual game matrix and analyze outcomes algefiddlingbraically! rgb P.S. -- Vincent, all of these simple problems were solved by mathematicians and statisticians so very, very, long ago, beginning with the work of Pascal and Fermat (there are names to conjure with, eh?) solving the problem posed by the Chevalier de Mere regarding an even bet on double sixes happening at least once in 24 throws: actual probability of double sixes per throw are (of course) 1/36, probability of no double six in 24 throws are (35/36)^24, odds of at least one are therefore 1 - (35/36)^24 = 0.4914038761 -- all paper fiddling, mind you -- a result that is eerily reminiscent of the solution to your problem, but with fewer slots. So at even odds it is -- barely -- a sucker's bet. But a margin of 0.86% is enough to empty even the deepest pockets, over time. Now all you have to do is advance your actual knowledge of statistics beyond that realized by an idly rich French nobleman in 1654 (who still was wise enough to recognize that it wasn't an even bet and consulted the best of the best of the minds of his day to prove it). You have a mere 357 years to go...:-) P.P.S -- If "all rngs" were really as bad as you assert, does it not stand to reason that "all Monte Carlo computations" that use them would all get egregiously incorrect results? And yet they don't. In fact, in problems (like the Ising model in 2D) where known solutions exist, they agree basically perfectly with the theoretical solution, and of course it is easy to compare a wide range of integrals and Markov process outcomes with theory. So if you used your simple common sense you would construct a mental argument like: "Either I, in my brilliance, have discovered an egregious flaw in all random number generators used by all of those STUPID computer scientists, mathematicians, and physicists for decades to do their long and complex computations that no doubt all got equally egregiously wrong answers; Or Those computer scientists, mathematicians, and physicists are actually pretty smart and aggressively check their work (and each other's work) with a strong incentive to discover problems. It is rather probable that any such egregious error would have been long ago discovered; therefore there is almost certainly a serious error in my own reasoning." Seriously, dude. Ask yourself "Am I really smarter and better informed than Pascal, Fermat, Laplace, Bayes, not to mention all of those contemporary humans who have been devoting entire well-educated careers to random numbers as if all of modern e-commerce depended on them (it does) or is it just barely possible that I've made a mistake?" Come on, you can do it. I know it is difficult for you, but try.. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 08:57:55 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 08:57:55 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR PROGRAM. Bullshit. "Every programmer" isn't dumb as a post. Or wasn't my argument clear enough? Do you need me to actually post the code for how the GSL -- written by at least some of these programmers -- do this? Here, I'll try again. This time I'll use smaller numbers and make an actual table of the outcomes: Imagine only two lousy random bits, enough to make 00, 01, 10, 11 (or 0,1,2,3). Here is the probability table: r = 0 1 2 3 ------------------------ p = 0.25 0.25 0.25 0.25 Let us generate N samples from this distribution. Our expected frequency of the occurrence of all of these numbers is: r = 0 1 2 3 --------------------------------- Np = 0.25*N 0.25*N 0.25*N 0.25*N Is this all clear? If I generate 100 random numbers, the expected number of 3's is 0.25*100 = 25. Now apply mod 3 the outcomes are now: r = 0 1 2 3 r%3 = 0 1 2 0 --------------------------------- Np = 0.25*N 0.25*N 0.25*N 0.25*N You now sum the number of outcomes for each element in the mod 3 table, since we have two values of r that make one value of r%3 and frequency clearly aggregates as the outcomes are independent. r%3 = 0 1 2 --------------------------------- Np = 0.50*N 0.25*N 0.25*N 0.25*N It is therefore twice as likely that two random bits, modulus 3, will produce a zero. > > Apologies for the caps. I hope how important this is. You're claiming all > programmers > use random numbers in a faulty manner? They don't. Only you do. Everybody else takes a uniform deviate and scales it by the number of desired integer outcomes, checking to make sure that they don't go out of bounds and thereby e.g. get an incorrect endpoint frequency. The gsl code is open source and it takes two minutes to download it and check (I just timed it). Go on, look. the file is rng/rng.c in the gsl distro directory, the function name is gsl_rng_uniform_int. No modulus. The exception is (obviously) when the range is a power of 2. In that case ONLY, r%n where r is a binary uint and n is a power of 2 will (obviously) equally balance the table above. Personally I'd use >> and shift the bits because it is faster than mod, but suit yourself, after you've learned what you are doing. > > This is important enough to further discuss about it. > > As nearly always you need random numbers from within a given domain say 0.. > n-1 > So projecting a RNG onto that domain is pretty crucial. How would you want to > do that in a correct manner? > > In the slot test in fact a simple AND is enough. No, as I've just proven algebraically. The correct manner for general n is the gsl code, but in rough terms it is n*r/r_max (with care used to avoid roundoff errors at the ends as noted). If you've been using modulus, all your results are crap. Look, the reason God invented the GSL and made it open source is so numb-nuts and smart people alike wouldn't have to constantly reinvent the wheel, badly. Use it. Don't question it -- you obviously aren't competent to. Just use it. If you want a random integer from 0 to n, use gsl_rn_uniform_int. If you want this for e.g. mt19937 don't write the latter, set up the gsl to use it to generate your ints. Learn to use it carefully, use it correctly, but use it. > It's not interesting to discuss - but yes this strategy makes money in > casino's, > you just get thrown out of the casino and end up at the blacklist if you do. You are clearly too stupid to be allowed out of the house without a caretaker. I'm not going to walk you through the proof that this isn't so as it is openly published and I've already referenced a step my step analysis that you can't be bothered, apparently, to actually read. I'll just reiterate the previous offer -- I, too, am happy to buy a roulette wheel and you can come over and bet Martingale against me all day. Just one 0, no limits and no quitting, infinite credit on both sides, we play until it is obvious to you that you are losing, have lost, will always lose, and the longer you play the more that you will lose. Loser buys the winner a case of truly excellent beer. Look, why don't you fix your random number code and try again, since your simulations are obviously trash. It isn't difficult to show this with simulations, once you actually code them correctly, but I have to go and don't have time to do it for you. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Aug 26 12:53:14 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 18:53:14 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> Message-ID: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl> On Aug 26, 2011, at 2:57 PM, Robert G. Brown wrote: > On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > >> EVERY PROGRAMMER IS DOING THIS TO USE RANDOM NUMBERS IN THEIR >> PROGRAM. > > Bullshit. "Every programmer" isn't dumb as a post. Or wasn't my > argument clear enough? Do you need me to actually post the code > for how > the GSL -- written by at least some of these programmers -- do this? > Here, I'll try again. This time I'll use smaller numbers and make an > actual table of the outcomes: Imagine only two lousy random bits, > enough to make 00, 01, 10, 11 (or 0,1,2,3). Here is the probability > table: > > r = 0 1 2 3 > ------------------------ > p = 0.25 0.25 0.25 0.25 > > Let us generate N samples from this distribution. Our expected > frequency of the occurrence of all of these numbers is: > > r = 0 1 2 3 > --------------------------------- > Np = 0.25*N 0.25*N 0.25*N 0.25*N > > Is this all clear? If I generate 100 random numbers, the expected > number of 3's is 0.25*100 = 25. Now apply mod 3 the outcomes are > now: > > r = 0 1 2 3 > r%3 = 0 1 2 0 > --------------------------------- > Np = 0.25*N 0.25*N 0.25*N 0.25*N > > You now sum the number of outcomes for each element in the mod 3 > table, > since we have two values of r that make one value of r%3 and frequency > clearly aggregates as the outcomes are independent. > > r%3 = 0 1 2 > --------------------------------- > Np = 0.50*N 0.25*N 0.25*N 0.25*N > > It is therefore twice as likely that two random bits, modulus 3, will > produce a zero. > If you have a domain of 0..3 where a generator generates and your modulo n is just n-1, obviously that means it'll map a tad more to 0. Basically the deviation one would be able to measure in such case is that if we have a generator that runs over a field of say size m and we want to map that onto n entries then we have the next formula : m = x * n + y; Now your theory is basically if i summarize it that in such case the entries 0..y-1 will have a tad higher hit than y.. m-1. However if x is large enough that shouldn't be a big problem. If we map now in the test i'm doing onto say a few million to a billion entries, the size of that x is a number of 40+ bits for most RNG's. So that means that the deviation of the effect you show above the order of magnitued of 1 / 2^40 in such case, which is rather small. Especially because the 'test' if you want to call it like that, is operating in the granularity O ( log n ), we can fully ignore then the expected deviation granularity O ( 2 ^ 40 ). >> >> Apologies for the caps. I hope how important this is. You're >> claiming all programmers >> use random numbers in a faulty manner? > > They don't. Only you do. Everybody else takes a uniform deviate and > scales it by the number of desired integer outcomes, checking to make > sure that they don't go out of bounds and thereby e.g. get an > incorrect > endpoint frequency. The gsl code is open source and it takes two > minutes to download it and check (I just timed it). Go on, look. the > file is rng/rng.c in the gsl distro directory, the function name is > gsl_rng_uniform_int. No modulus. > > The exception is (obviously) when the range is a power of 2. In that > case ONLY, r%n where r is a binary uint and n is a power of 2 will > (obviously) equally balance the table above. Personally I'd use >> > and > shift the bits because it is faster than mod, but suit yourself, after > you've learned what you are doing. > >> >> This is important enough to further discuss about it. >> >> As nearly always you need random numbers from within a given >> domain say 0.. n-1 >> So projecting a RNG onto that domain is pretty crucial. How would >> you want to do that in a correct manner? >> >> In the slot test in fact a simple AND is enough. > > No, as I've just proven algebraically. The correct manner for > general n > is the gsl code, but in rough terms it is n*r/r_max (with care used to > avoid roundoff errors at the ends as noted). If you've been using > modulus, all your results are crap. > > Look, the reason God invented the GSL and made it open source is so > numb-nuts and smart people alike wouldn't have to constantly reinvent > the wheel, badly. Use it. Don't question it -- you obviously aren't > competent to. Just use it. If you want a random integer from 0 to n, > use gsl_rn_uniform_int. If you want this for e.g. mt19937 don't write > the latter, set up the gsl to use it to generate your ints. Learn to > use it carefully, use it correctly, but use it. > >> It's not interesting to discuss - but yes this strategy makes >> money in casino's, >> you just get thrown out of the casino and end up at the blacklist >> if you do. > > You are clearly too stupid to be allowed out of the house without a > caretaker. I'm not going to walk you through the proof that this > isn't > so as it is openly published and I've already referenced a step my > step > analysis that you can't be bothered, apparently, to actually read. > I'll > just reiterate the previous offer -- I, too, am happy to buy a > roulette > wheel and you can come over and bet Martingale against me all day. > Just > one 0, no limits and no quitting, infinite credit on both sides, we > play > until it is obvious to you that you are losing, have lost, will always > lose, and the longer you play the more that you will lose. Loser buys > the winner a case of truly excellent beer. > > Look, why don't you fix your random number code and try again, since > your simulations are obviously trash. It isn't difficult to show this > with simulations, once you actually code them correctly, but I have to > go and don't have time to do it for you. > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From diep at xs4all.nl Fri Aug 26 14:17:46 2011 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 26 Aug 2011 20:17:46 +0200 Subject: [Beowulf] OT: public random numbers? In-Reply-To: <40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> <40507B73-2C40-448C-BC9A-8CDF799FDFB9@gmail.com> Message-ID: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl> On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote: > I hate to troll, but... > > On Aug 25, 2011, at 8:27 PM, Vincent Diepeveen wrote: > >> >> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote: >> >>> On Thu, 25 Aug 2011, Vincent Diepeveen wrote: >>> >>>> I noticed that most generated semi-random numbers with software >>>> generators, >>>> had the habit to truely adress a search space of n always in O (n >>>> log n). >>>> >>>> So if you draw from most software RNG's a number and do it >>>> modulo n, >>>> with n being not too tiny, say quite some millions or even >>>> billions, then every >>>> slot in your 'hashtable' will get hit at least once by the RNG, >>>> whereas data >>>> in reality simply happens to not have that habit simply. >>>> >>>> So true random numbers versus generated noise is in this manner >>>> easy >>>> to distinguish by this. Now i didn't study literature whether some >>>> other chap >>>> some long time ago already had invented this. That would be most >>>> interesting >>>> to know. >>> >>> Some other chap named George Marsaglia (and to some extent another >>> chap >>> named Donald Knuth) have already invented this. A number of >>> tests of >>> the tails of random number generators are already in dieharder. All >>> "good" modern rngs pass these tests. >>> >>> The Martingale betting system you are looking at is even older (at >>> least >>> Marsaglia and Knuth are still alive). It dates back to the 18th >>> century, and is well known to be flawed for a variety of reasons, >>> not >>> the least of which is that gamblers don't have the infinite wealth >>> necessary to make this >>even<< a zero-sum strategy and casinos have >> >> From mathematical viewpoint it makes perfect cash. >> As statistica odds is you already have build up considerable profit >> when a worst case (that you hit the 10 times practical double limit) >> hits you. > > A betting system will not improve the negative mathematical > expectation of a casino game. the doubling system doesn't have a negative expectation. You are allowed to double 10 times practical if you start with 1. Of all systems in roulette this is the only system that will produce a profit, just theoretical spoken, practice we all agree. they kick you out. > If your mathematical expectation is -1 for each trial, it's -10 > for ten trials. You will not win in the long-run using Martingale. > Except that this system doesn't have a negative expectation. it has a positive expectation. There is no other system in roulette that has a positive expectation, other than the doubling system. Please use European Casino model. I don't live in the USA. >> >> The simulations are of course using the practical limit. >> >> Note that the European casino's have a single zero. >> In USA there is even more greedy mafia controlling all the casino's, >> there are 2 zero's there. 0 and 00. >> >> The simulations were for European casino's. >> >>> betting limits that de facto make it impossible to pursue the >>> requisite >>> number of steps and in roulette in particular have 0 and/or 00 >>> slots and >>> aren't zero-sum to begin with. You can read a decent analysis of >>> outcomes based on the presumed binomial distribution of a zero-sum >>> game >>> here: >>> >>> http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 >>> >> >> You're not allowed to use a system in a casino, so we speak about >> theory. Probably first evening they let you try. Second day you'll >> get on the blacklist. > > Nonsense. Have you ever been to a casino? > You are welcome to Martingale all day long at any of them. > Hell, I'll buy a roulette wheel and you can come over to my place > if you play this strategy or any if its variants. > The casino wants you to Martingale -- it's favorable to them. > Why would they stop a loser? The doubling system in all casino's if you'd apply to it in an objective manner and would be allowed to - it makes a profit. Same for some slot machines over there. After some others played on it and it swallowed money - then majority of slot machines are not negative sum games anymore. If you play on them then, it's a positive sum game. If it would be always negative sum games then no lady would keep playing slot machines. > > The casino is not concerned with betting strategies. It is > concerned with folks gaining an edge. A betting system alone will > not give the player an edge. > No very wrong, a casino is interested in maximizing its profit. Kicking out folks that do well is part of that game. Oh by the way - I worked for a casino. Did you? >> >>> Your test below is interesting, though. The only real problems I >>> can >>> see with actually using it in dieharder are: >>> >> >> Yeah more interesting than the billion times discussed roulette >> system which >> has been analyzed completely flat. >> >>> a) One would need a theoretical estimate of the distribution of >>> filling given n log n draws on an n-slotted table (for largish n). >>> That >>> is, for a perfect rng, what SHOULD the distribution of success/ >>> failure >>> be. >> >> As we figured out by now in Artificial Intelligence the statistical >> assumptions made in the past they simply do not hold. >> >> For Artificial Intelligence we need a new sort of theoretical theory. >> >> As for the distribution problem, generatiors having a spread that's >> too accurate, >> the way to deliver a proof would be for example build a simple >> device. >> >> Build an old fashioned box where you can draw balls. Remember what >> you coud >> see on TV some 20 years ago or so (not sure it was like that in USA). >> >> A big basked with balls. The basket, in fact it's looking like this: >> >> http://www.rateyours.com/blog/uploaded_images/ >> lottery_machine-727064.jpg >> >> But now a much bigger machine like this with inside different means >> of randomizing the balls, >> actually also randomly modifying the inside obstacles of shaking of >> the balls. >> >> After a ball has been drawn you automatically have it annotated and >> the ball immediately goes back >> into the machine. For a full minute you have the balls in the machine >> shaken again and you draw >> again a ball. It is important to do this randomizing of the balls >> inside the machine for quite some time. >> I would propose a minute. >> >> Of course you have to do this with quite some balls. Say a thousand. >> >> Then you draw balls until all numbers have been drawn at least once. >> >> This cool experiment can be easily build. Of course the expected >> running time of a single experiment >> will be a few weeks. >> >> You can produce a number of those drawing machines though and have a >> look. >> >> Theories that seemingly work for small n, n being the number of >> balls, >> are much harder to maintain at bigger n's, as we also see in prime >> number research. >> >> The way how the machine gets designed of course is total crucial. I >> would propose a design that >> really shakes the balls really a lot through each other and really >> very thoroughly. >> >> Just like we nowadays know how flawed a big number of card shaking >> machines are that are popular to use. >> >> Such a lottery with realy a lot of balls would be very interesting to >> see the outcomes from. >> >> In fact i would prefer having produced number of those machines, so >> that it's possible to really have a lot of outcomes >> and then analyze them very well. >> >>> >>> b) One would then need the CDF for this distribution, to be able to >>> turn the results of N trials (of n log n pulls each) into a p-value >>> under the null hypothesis -- the probability of obtaining the >>> particular >>> number of successes and failures presuming a perfectly random >>> generator. >>> >>> That way dieharder could apply it rigorously to its 70 or 80 >>> embedded >>> rngs or to any user's outboard generator. There probably is >>> theoretical >>> statistical support for the PD and/or CDF -- you're analyzing the >>> tails >>> of a poissonian process -- but finding it or doing it yourself (or >>> myself), aye, that's the rub. One cannot just say "high degree of >>> certainty that it is an RNG" (by which one means that the rng in >>> question fails the test for randomness) in the test. HOW high? >>> Perfect >>> rngs or perfectly random processes will sometimes fill your >>> table, but >>> how often? >> >> If we assume that reality of life represents randomness, which is >> another >> rather good question in how far that theory is plausible, then using >> that >> assumption i'm very sure that the RNG's i investigated so far >> have a distribution which is too perfect, more perfect than i have >> seen >> in any reality. >> >> In fact most RNG's fill all slots faster than O ( n log n ), yet it's >> O ( n log n ) >> that they follow. >> >> This is RNG's that have come through all tests as being a good and >> very acceptabe RNG to be used. >> >> Realize i'm no RNG expert, so all the names of all those tests. >> >> For me it's just push button technology. I just designed a test >> and found it very odd that all RNG's have such perfect distributions >> that they don't even miss a single slot. >> >> I'd argue the only test that would be interesting to me to see how it >> might be in reality is the lottery machine test - yet with really >> a lot >> of balls. I'd prefer 10k balls over a 1000 in fact - yet for >> practical >> reasons i would agree with a number of above a 1000. >> >> Paper fiddling is really not interesting to me there to prove >> anything, >> as what i've seen in reality in randomness is total different from >> how >> RNG's model that. >> >> Regards, >> Vincent >> >> >>> How can you differentiate an "accident" when one does from >>> an actual failure? All of those questions require a more rigorous >>> theory and quantitative result embedded in a test that can be >>> systematically cranked up to more clearly resolve failures until >>> they >>> are unambiguous, not marginal maybe yes maybe no. >>> >>> I suspect that the failures this test would reveal are already more >>> than >>> covered in dieharder, in particular by the bit distribution tests >>> and >>> the monkey tests, but I'm not terribly happy with the monkey >>> tests and >>> would be perfectly thrilled to have a simpler to compute test that >>> revealed precisely this sort of flaw, systematically. And it >>> doesn't >>> hurt at all to have partially or fully redundant tests as long as >>> the >>> test themselves are rigorously valid. If you can find or compute >>> the >>> CDF for your test below, I'd be happy to wrap it up and add it to >>> dieharder, in other words. One can always SIMULATE a CDF, of >>> course, >>> but that requires a known good generator and sort of begs the >>> question >>> if you don't think that e.g. AES or threefish or KISS are good >>> generators that would actually pass your test. >>> >>> Even hardware/quantum sources of random bits are suspect -- they >>> often >>> are generated by a process that leaves in the traces of an >>> underlying >>> distribution. I'm not convinced that >>any<< process in the real >>> world >>> is >>truly<< random. Physics is ambiguous on the issue -- the >>> quantum >>> description of a closed system is just as deterministic as the >>> classical >>> one, and Master equation unpredictability on open subsets of a large >>> closed system reflects entropy/ignorance, not actual randomness >>> (hence >>> Einstein's famous "doesn't play dice" remark). But lots of this are >>> sufficiently random that one cannot detect any failure of >>> randomness, >>> modern crypto class generators being a prime example. >>> >>> rgb >>> >>>> >>>> In semi pseudo code, let's take an array of size a billion as an >>>> example, >>>> though usually a few million is more than ok: >>>> >>>> n = 2^30; // 2 to the power 30 >>>> >>>> Function TestNumbersForRandomness(RNG,n) { >>>> declare array hashtable[size n]; >>>> >>>> guessednlogn = 2 * (log n / log 2) * n; >>>> >>>> for( i = 0 ; i < n ; i++ ) >>>> hashtable[i] = FALSE; >>>> >>>> ndraws = filledn = 0; >>>> while( ndraws < guessednlogn ) { >>>> randomnumber = RNG(); >>>> r = randomnumber % n; // randomnumber = r (mod n) >>>> if( hashtable[r] == FALSE ) { >>>> hashtable[r] = TRUE; >>>> filledn++; >>>> if( filledn >= n ) >>>> break; >>>> >>>> } >>>> ndraws++; >>>> } >>>> >>>> if( filledn >= n ) >>>> print "With high degree of certainty data generated by a RNG\n"); >>>> else >>>> print "Not so sure it's a RNG\n"; >>>> >>>> } >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> Vincent >>>> >>>> >>>> >>>> >>>>> -- both unpredictable and >>>>> flat/decorrelated at all orders, and even though there aren't >>>>> really >>>>> enough of them for my purposes, I've used them as one of the >>>>> (small) >>>>> "gold standard" sources for testing dieharder even as I test >>>>> them. For >>>>> all practical purposes threefish or aes are truly random as >>>>> well and >>>>> they are a lot faster and easier to use as gold standard >>>>> generators, >>>>> though. >>>>> I don't quite understand why the single site restriction is >>>>> important -- >>>>> this site has been up for years and I don't expect it to go away >>>>> soon; >>>>> it is quite reliable. I don't think there is anything secret >>>>> about how >>>>> the numbers are generated, and I'll certify that the numbers it >>>>> produces >>>>> don't make dieharder unhappy. So 1 is fixable with a bit of >>>>> effort on >>>>> your part; 6 I don't really understand but the guy who runs the >>>>> site is >>>>> clearly willing to construct a custom feed for cash customers, if >>>>> there >>>>> is enough value in whatever it is you are trying to do to pay for >>>>> access. If it's just a lottery, well, lord, I can think of a >>>>> dozen ways >>>>> to make numbers so random that they'd be unimpeachable for any >>>>> sort of >>>>> lottery, both unpredictable and uncorrelated, and they don't any >>>>> of them >>>>> require any significant amount of entropy to get started. >>>>> I will add one warning -- "randomness" is a rather stringent >>>>> mathematical criterion, and is generally tested against the null >>>>> hypothesis. Amateurs who want to make random number generators >>>>> out of >>>>> supposedly "random" data streams or fancy algorithms almost >>>>> invariably >>>>> fail, sometimes spectacularly so. There are a half dozen or more >>>>> really, really good pseudorandom number generators out there and >>>>> it is >>>>> easy to hotwire them together into an xor-based high entropy >>>>> stream that >>>>> basically never repeats (feeding it a bit of real entropy now and >>>>> then >>>>> as it operates). I would strongly counsel you against trying to >>>>> take >>>>> e.g. weather data and make something "random" out of it. >>>>> Unless you >>>>> really know what you are doing, you will probably make something >>>>> that >>>>> isn't at all random and may not even be unpredictable. Even most >>>>> sources of "quantum" randomness (which is at least possibly "truly >>>>> random", although I doubt it) aren't flat, so that they carry the >>>>> signature of their generation process unless/until you manage to >>>>> transform them into something flat (difficult unless you KNOW the >>>>> distribution they are producing). Pseudorandom number generators >>>>> have >>>>> the serious advantage of being amenable to at least some >>>>> theoretical >>>>> analysis (so you can "guarantee" flatness out to some high >>>>> dimensionality, say) as well as empirical testing with e.g. >>>>> dieharder. >>>>> HTH, >>>>> >>>>> rgb >>>>>> Thanks, >>>>>> David Mathog >>>>>> mathog at caltech.edu >>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>>>> Robert G. Brown http:// >>>>> www.phy.duke.edu/~rgb/ >>>>> Duke University Dept. of Physics, Box 90305 >>>>> Durham, N.C. 27708-0305 >>>>> Phone: 1-919-660-2567 Fax: 919-660-2525 >>>>> email:rgb at phy.duke.edu >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>>>> Computing >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> Robert G. Brown http://www.phy.duke.edu/ >>> ~rgb/ >>> Duke University Dept. of Physics, Box 90305 >>> Durham, N.C. 27708-0305 >>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu >>> >>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Fri Aug 26 17:46:30 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 26 Aug 2011 17:46:30 -0400 (EDT) Subject: [Beowulf] OT: public random numbers? In-Reply-To: <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl> References: <3CF65D31-DF66-4D00-A044-8DE44C664A1E@xs4all.nl> <9C1DDC96-CF7C-42A9-A2B3-0D007556C2B6@xs4all.nl> <698E9FD6-1F9C-4F42-9EC2-B409DCDD274B@xs4all.nl> Message-ID: On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > If you have a domain of 0..3 where a generator generates and your modulo n is > just n-1, obviously that means it'll map a tad more to 0. > > Basically the deviation one would be able to measure in such case is that if > we have a generator that runs over a field of say size m and we want to map > that onto n entries then we have the next formula : > > m = x * n + y; > > Now your theory is basically if i summarize it that in such case the entries > 0..y-1 will have a tad higher hit than y.. m-1. What's a "tad" when you're measuring the quality of an RNG? I'm just curious. Could you be more specific? Just what are the limits, specifically, if your random number is a 30 bit uint that makes numbers in the range of 0-1 billion (nearly all generators make uints, with a few exceptions that usually make less than 32 bits -- 64 bit generators are still a rarity although of course two 32 rands makes one 64 bit rand) and you mod with m, especially a nice large m that doesn't integer divide r like 1.5 million? That means that each integer in the range 0 to 1.5 million gets 666 repetitions as the entire range of r gets sampled, except the first million that get 667. That means that the odds of pulling a number from 1 to a million are 1e6*667/1.e9 = .667. The odds of pulling a number from the second million are 0.5*666/1.e9 = .333 = 1 - .667. An old lady with terrible eyes could detect such an imbalance in probability from across the street -- you wouldn't even need a "random number generator tester". Weren't you advocating using this for nice large m like a million? I think that you were. No, wait! You were advocating something like one BILLION, right? Wrong direction to make it better, dude, this makes it worse. Note that this scales pretty well. For m in the range of thousands, the imbalance will be something like 0.666667 and 0.333330 -- still pretty easy to detect with any halfway decent RNG tester. Basically, you don't get (immeasurably) close to a uniform distribution in the weighting of any integer until you get down to (unsurprisingly) m of order unity compared to a uint, which at which point it basically becomes as accurate as m*(r/r_max) was in the first place. Note also that you've created an imbalance in the weighting of the integers you are sampling that is far, far greater and more serious than any other failure of randomness that your RNG might have. So much so that you couldn't possibly see any other -- it would be a signal swamped in the noise of your error, even for m merely in the thousands -- one part per million errors in randomness are easy to detect in simulations that draw 10^9 or so random numbers (which is still a SMALL TEST simulation -- real simulations draw 10^16 or 10^18 and your error would put answers on another PLANET compared to where they should be. Most coders probably can actually work this out with a pencil, and so I repeat no, nobody competent uses a modulus to generate integers in a fixed range in circumstances where the quality of the result matters, e.g. numerical simulation or cryptography as opposed to gaming, unless the modulus is a power of two. > However if x is large enough that shouldn't be a big problem. > > If we map now in the test i'm doing onto say a few million to a billion > entries, > the size of that x is a number of 40+ bits for most RNG's. x=32, a uint for most RNGs. Or to put it another way, RNGs generate a bit stream, which they might do with an algorithm that generates 30,31,32, or more bits at a time, but the prevalence of 32 bit architectures and the fact that it is trivial to concatenate 32s to get 64+ bits when desired has slowed the development of true 64 bit RNGs. Eventually there will be some, of course, and it will STILL be a mistake to use a modulus to create random integers in some general range. A bad algorithm is a bad algorithm, and this makes sense only if speed is more important than randomness (in which case one has to wonder, why use a 64 bit RNG in the first place, why use a good RNG in the first place). > So that means that the deviation of the effect you show above the order of > magnitued of 1 / 2^40 in such case, which is rather small. Except that it isn't, as I showed in a fair bit of detail. It might be if x were as large as you claim, which it isn't (in general or even "commonly") and if one confined m to be order unity. For m of order 2^20 (a million) the error for 2^40 is order 2^20 (a millionth) which shows up even in single precision floating point. Why bother testing such a stream for randomness? It fails. You've made it fail. It fails spectacularly if the generator is perfect, if the goddess Ifni herself produces the string of digits. It cannot succeed. > Especially because the 'test' if you want to call it like that, is operating > in the > granularity O ( log n ), we can fully ignore then the expected deviation > granularity O ( 2 ^ 40 ). Well, except that basically 100% of the rngs in the GSL pass your "test" when it is written correctly. They also produce precisely the correct/expected result (within easily understandable and expected random scatter) on top of that if they are "good" rngs. So the "test" isn't much of an actual test, and your assertion that "all rngs fail it" is false and based on a methodology that introduces many orders of magnitude of error greater than the generators are known to have as upper bounds. Given this fact, which I have personally verified, do you imagine that there might be other errors in your actual (not your pseudo) code? You gotta wonder. If you've tested a Mersenne Twister with your "test" and it fails to pass, either an MT is crap and all of the theoretical papers and experienced testers who have tested with sophisticated and peer-reviewed tools are stupid poo-poo heads, or, well, could it be that your test or implementation of the MT is crap and the MT itself in general is what everyone else seems to think that it is based on extensive "paper fiddling" and enormous piles of empirical testing evidence written by actual statisticians and rng experts. Which is to say, a damn good pseudo-RNG decorrelated in some 600+ dimensions that passes nearly all known tests with flying colors. Hmmm, let's put on our Bayesian thinking caps, consider the priors, and try very hard to guess which one is much much more likely on Jaynes' "decibel" scale of probabilities. Would you say that it is 20 decibels more likely that the MT is good and the test is broken? 50? 200? I like 2000 or thereabouts myself, or as we in the business might say, "it is a fact" that your test is broken since 10^200 is a really big number, comparatively speaking. Now, it would be nice if you apologized to "all RNGs" and "all programmers" and the various other groups you indicted on your little fallacious rant, but I'll consider myself enormously fortunate at this point if you simply acknowledge that maybe, just maybe, your original pronouncement -- that all rngs produce an egregiously, trivially verifiable excessive degree of first order sequential uniformity, is categorically and undeniably false. Of course, if you think I'm lying just to make you look bad, I can post a modified version of dieharder with your test embedded so absolutely anybody can see for themselves that all of the embedded generators pass your test and that not one single thing you asserted in grandiosely producing it was correct. The code is quite short and anybody can understand it. Or you can take my moderately expert word for it -- the results I posted are honest results produced using real RNGs from a real numerical library in the real test written by block copying your pseudocode, converting/realizing it in C, and fixing your obvious error in the generation of random ints in the range 0-m by using a tested algorithm written by people who actually know what they are doing that is IN the aforementioned real numerical library. Seriously, it is done. Finished. You're wrong. Say "I'm sorry, Mr. Mersenne Twister, if my test passes randu then how could it possibly fail you?" And don't forget to apologize to AES, RSA, DES, and all of the other encryption schema too. They all feel real bad that you called them stupid poo-poo heads unable to pass the simplest first order frequency test one can imagine, since they all had to pass MUCH more rigorous and often government mandated testing to ever get adopted as the basis for encryption. I don't expect an apology to me for being indicted along with ALL the OTHER programmers in the world for being stupid enough to use mod to make a supposedly uniformly distributed range of m rands. Not even Numerical Recipes was that boneheaded. But its OK, we all know that we didn't really ever do that, and if you did (and continue to do, apparently, learning nothing from my patient and thorough exposition of how it produces errors that are vastly greater than the ones that you think you are detecting) that's a problem to who? That's right, mister. To you. You'll just keep getting wroooooong answers, and then announcing them as fact and making yourself look silly. Or even sillier, if that is possible. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From atp at piskorski.com Sat Aug 27 10:26:23 2011 From: atp at piskorski.com (Andrew Piskorski) Date: Sat, 27 Aug 2011 10:26:23 -0400 Subject: [Beowulf] OT: public random numbers? In-Reply-To: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl> References: <5E562FA3-8DC1-4BB9-AE5F-599D378AFB59@xs4all.nl> Message-ID: <20110827142623.GA29931@piskorski.com> On Fri, Aug 26, 2011 at 08:17:46PM +0200, Vincent Diepeveen wrote: > On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote: >> A betting system will not improve the negative mathematical >> expectation of a casino game. Right. > Except that this system doesn't have a negative expectation. it has a > positive expectation. > > There is no other system in roulette that has a positive expectation, > other than the doubling system. Vincent, are you shitting us? Or am I misremembering the tortured history of this thread, and by "doubling system" you do NOT mean the trivial martingale betting system that's been used (disastrously) and analyzed for over 200 years? Actually it doesn't matter; as Shawn Hood pointed out above, your assertion is still wrong even if you actually meant some other non-martingale betting system. You insisting that *martingale* betting gives you a positive expectation at roulette just makes it much funnier! There are ways to gain positive expectation in roulette (other than the obvious fraud and collusion). They involve finding a poorly installed roulette table and using a wearable computer and physics to predict where the ball will land. Look up Thorp and Shannon's research on the subject; they actually used it in casinos c. 1961. None of those ways are due to some special method of betting. The point of betting systems is to optimize your small edge, but you have to HAVE that edge in the first place. Money management is important because tells you how to properly size your risk, but it can't give you alpha. Now yes, if you have a very volatile "roulette" game and a 0% edge (no advantage to either you or the house), with some luck you could get rich by playing it for a limited period of time and quitting while you're ahead. But you still have a 0% expectation game; look up the mathematical definition of "expectation". Also, I don't remember for sure, but I believe martingale betting is (always) more aggressive than Kelly. If so, then it is inherently stupid. Kelly defines the MAXIMUM size bet that it is rational to make, assuming your goal is maximum compounded wealth AND you have a quantifiable edge (however small) in the game. It can make sense to bet less than Kelly, and if you believe you have no edge the rational bet is zero. It is never rational to bet more than Kelly. In practice, even when you are sure you have a real edge, you want to bet less than Kelly, often much less. There are several reasons for that; one is that calculating Kelly depends on your estimate of how big your edge is, and it is easy to overestimate your edge such that in truth you are massively overbetting (taking way too much risk) at 2x Kelly or even more. But optimizing the way you bet doesn't turn an inherently losing game into a winner. If the edge is with the house - as it certainly is with a fair roulette table - the rational bet is not to make one. This news article is probably more interesting: http://www.theonion.com/articles/casino-has-great-night,1506/ Casino Has Great Night; May 28, 2003 -- Andrew Piskorski _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Sat Aug 27 11:27:37 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Sat, 27 Aug 2011 08:27:37 -0700 Subject: [Beowulf] OT: public random numbers? In-Reply-To: <20110827142623.GA29931@piskorski.com> Message-ID: > > > >There are ways to gain positive expectation in roulette (other than >the obvious fraud and collusion). They involve finding a poorly >installed roulette table and using a wearable computer and physics to >predict where the ball will land. Look up Thorp and Shannon's >research on the subject; they actually used it in casinos c. 1961. I think Shannon and Thorpe just analyzed it, without actually using it. See "The Eudaemonic Pie" about some physics guys at UC Santa Cruz who built wearable hardware. Early 70s, I should think, based on my recollections of the kind of ICs they were using. (I also note, based on the book, that while they were good at the physics, they weren't very good at electronics design and construction) They never made the system work very well (concept sound, execution not so hot)..but it did encourage the gaming industry to get new laws prohibiting the use of assistive devices. Just you and the casino, mano a mano (or, more accurately cerebro a leyes de la probabilidad) > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Wed Aug 31 13:29:18 2011 From: mathog at caltech.edu (David Mathog) Date: Wed, 31 Aug 2011 10:29:18 -0700 Subject: [Beowulf] materials for air shroud? Message-ID: Anybody know of a nice cheap, high melting point, easy to work with sheet material, for making a custom air shroud? We have one box with stuff in it that looks similar to HDPE, the material the white flexible cutting boards are made of, but it is a bit thinner and more rigid that that. Unfortunately there are no markings on it, so HDPE is just a guess. Whatever it is, it cut easily with scissors (I had to trim it slightly at one point.) Background. We have an older Supermicro SC-823 server with dual processors. The air shroud it came with only covers the first processor. That didn't matter much when it had two low power processors in it, but after upgrading it to dual Opteron 280s, the uncovered second one runs considerably hotter than the covered front one. (Swapping the processors around didn't help - the heat stayed where it was, so a ventilation issue, not a processor issue.) Supermicro does make a newer shroud which extends to the back of the case, but the manual (google for "SC-823 air shroud user's guide") indicates that it is designed for Intel CPUs. So it may or may not fit around the Opterons. The redesigned air shroud will probably work, but I'm about 90% confident that taping a sheet of plastic onto the back of the existing shroud would work as well - if I can find a plastic that won't flap around or melt. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Wed Aug 31 14:20:03 2011 From: deadline at eadline.org (Douglas Eadline) Date: Wed, 31 Aug 2011 14:20:03 -0400 (EDT) Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: <44280.192.168.93.213.1314814803.squirrel@mail.eadline.org> David, I have experimented with some simple ducting for my Limulus system. I found a Vinyl Flashing from Union Corrugating Company (purchased at Lowes home center) that has some nice features, it is bendable, holds its shape, easy to cut, and has a low carbon content (harder to burn than most plastics), and it is fairly stiff. My needs are "low temp" air ducting. I have not tested it with constant warm/hot air. -- Doug > Anybody know of a nice cheap, high melting point, easy to work with > sheet material, for making a custom air shroud? > > We have one box with stuff in it that looks similar to HDPE, the > material the white flexible cutting boards are made of, but it is a bit > thinner and more rigid that that. Unfortunately there are no markings > on it, so HDPE is just a guess. Whatever it is, it cut easily with > scissors (I had to trim it slightly at one point.) > > Background. We have an older Supermicro SC-823 server with dual > processors. The air shroud it came with only covers the first > processor. That didn't matter much when it had two low power processors > in it, but after upgrading it to dual Opteron 280s, the uncovered second > one runs considerably hotter than the covered front one. (Swapping the > processors around didn't help - the heat stayed where it was, so a > ventilation issue, not a processor issue.) Supermicro does make a newer > shroud which extends to the back of the case, but the manual (google for > "SC-823 air shroud user's guide") indicates that it is designed for > Intel CPUs. So it may or may not fit around the Opterons. > > The redesigned air shroud will probably work, but I'm about 90% > confident that taping a sheet of plastic onto the back of the existing > shroud would work as well - if I can find a plastic that won't flap > around or melt. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Aug 31 14:43:39 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 31 Aug 2011 11:43:39 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..) It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too. On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. ) Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar. Polystyrene and acrylic tend to be more brittle. ABS is very nice to work with. PVC is also easy to work with. Nylon is another possibility. Do you want to be able to glue it? What I would do is call up profesionalplastics.com formerly Cadillac Plastics (many outlets nationwide) and see what they have. It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go. Plastic Depot in Burbank has a huge selection. Drive over there, and ask the counter folks what would work for you. $10-20 will get you more plastic than you know what to do with. Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications. Jim Lux +1(818)354-2075 > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog > Sent: Wednesday, August 31, 2011 10:29 AM > To: beowulf at beowulf.org > Subject: [Beowulf] materials for air shroud? > > Anybody know of a nice cheap, high melting point, easy to work with > sheet material, for making a custom air shroud? > > We have one box with stuff in it that looks similar to HDPE, the > material the white flexible cutting boards are made of, but it is a bit > thinner and more rigid that that. Unfortunately there are no markings > on it, so HDPE is just a guess. Whatever it is, it cut easily with > scissors (I had to trim it slightly at one point.) > > Background. We have an older Supermicro SC-823 server with dual > processors. The air shroud it came with only covers the first > processor. That didn't matter much when it had two low power processors > in it, but after upgrading it to dual Opteron 280s, the uncovered second > one runs considerably hotter than the covered front one. (Swapping the > processors around didn't help - the heat stayed where it was, so a > ventilation issue, not a processor issue.) Supermicro does make a newer > shroud which extends to the back of the case, but the manual (google for > "SC-823 air shroud user's guide") indicates that it is designed for > Intel CPUs. So it may or may not fit around the Opterons. > > The redesigned air shroud will probably work, but I'm about 90% > confident that taping a sheet of plastic onto the back of the existing > shroud would work as well - if I can find a plastic that won't flap > around or melt. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Wed Aug 31 15:15:22 2011 From: mathog at caltech.edu (David Mathog) Date: Wed, 31 Aug 2011 12:15:22 -0700 Subject: [Beowulf] materials for air shroud? Message-ID: > Cardboard? Card stock? Masking tape? White glue? (that's what I > usually use for cooling ducts.. easy to cut, glue, tape..) It's no > more flammable than plastic, and it doesn't melt and get soft. That never crossed my mind. You sure about the flammability? I believe it for the ignition due to temperature (Fahrenheit 451 and all that). However, I have a gut feeling (but no data) that sparks are fairly likely to ignite cardboard, and less likely to ignite a solid plastic sheet (polyethylene or polypropylene, for instance). Not that I'm expecting sparks, but that is a real possibility when a power supply fails. Maybe even a brief flame. Of course paper won't hold up well compared to plastic if it gets wet. Moisture resistance is not important here though - if the insides of the computer are dripping, air shroud failure is the least of my worries. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Aug 31 15:18:36 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 31 Aug 2011 12:18:36 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: Paper doesn't catch fire at 451F.. it does start to turn brown. (Sorry Ray..) (I cook bacon on a rack over paper in a 450 degree oven.. and I doubt the temperature control is that tight) Flammability is an issue.. paper is rougher than most plastics, so a spark can lodge or a small fiber could catch. You could fireproof the paper pretty easily with a variety of treatments. Jim Lux +1(818)354-2075 > -----Original Message----- > From: David Mathog [mailto:mathog at caltech.edu] > Sent: Wednesday, August 31, 2011 12:15 PM > To: Lux, Jim (337C); beowulf at beowulf.org > Subject: RE: [Beowulf] materials for air shroud? > > > Cardboard? Card stock? Masking tape? White glue? (that's what I > > usually use for cooling ducts.. easy to cut, glue, tape..) It's no > > more flammable than plastic, and it doesn't melt and get soft. > > That never crossed my mind. > > You sure about the flammability? I believe it for the ignition due to > temperature (Fahrenheit 451 and all that). However, I have a gut > feeling (but no data) that sparks are fairly likely to ignite cardboard, > and less likely to ignite a solid plastic sheet (polyethylene or > polypropylene, for instance). Not that I'm expecting sparks, but that > is a real possibility when a power supply fails. Maybe even a brief > flame. Of course paper won't hold up well compared to plastic if it > gets wet. Moisture resistance is not important here though - if the > insides of the computer are dripping, air shroud failure is the least of > my worries. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From bill at cse.ucdavis.edu Wed Aug 31 17:04:44 2011 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 31 Aug 2011 14:04:44 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: <4E5EA1EC.7080804@cse.ucdavis.edu> On 08/31/2011 12:15 PM, David Mathog wrote: > That never crossed my mind. > > You sure about the flammability? I believe it for the ignition due to > temperature (Fahrenheit 451 and all that). However, I have a gut > feeling (but no data) that sparks are fairly likely to ignite cardboard, > and less likely to ignite a solid plastic sheet (polyethylene or > polypropylene, for instance). Not that I'm expecting sparks, but that > is a real possibility when a power supply fails. Maybe even a brief > flame. Of course paper won't hold up well compared to plastic if it > gets wet. Moisture resistance is not important here though - if the > insides of the computer are dripping, air shroud failure is the least of > my worries. I'm aware of a machine room fire that was attributed to cardboard dust and the storage of flammable material (paper and cardboard). I wouldn't recommend cardboard or anything else that might generate flammable dust in a high 50-90C airflow environment with low humidity. Supermicro does seem to play pretty fast and loose with a shroud and cooling in general. We had nodes bouncing off the thermal max (and throttling) despite air intake temperatures 30F below the specifications while having very low power load in the node (read that as no expansion cards, one low rpm disk, and the lowest clocked CPU). We did however get them to ship us free shrouds once we complained. Is it really worth wasting even an hour to not get the real shroud? Not sure if this is the one, but they aren't particularly expensive ($13): http://www.provantage.com/supermicro-mcp-310-18003-0n~7SUP91KW.htm _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Wed Aug 31 17:05:34 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 31 Aug 2011 17:05:34 -0400 (EDT) Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: On Wed, 31 Aug 2011, Lux, Jim (337C) wrote: Also thin aluminum. You can get aluminum sheeting that you can cut with scissors and that is easy to bend into shapes if you have a bending jig (or can make one with two pieces of board stock and a vise). Cheap, fireproof, meltproof at any temperatures you're likely to reach, no toxic fumes in a fire, can be glued or screwed. The one drawback is that it is a PITA to weld or solder if that's important to you, but for an air shroud you can probably make compression joints (interlocking U rims, squeezed down) that are adequate. Most hardware stores (roof flashing), some auto parts or hobby stores. Copper too, but more expensive. Don't know about thin "enough" sheet steel, but probably -- copper or steel would both weld or solder easily. rgb > Cardboard? Card stock? Masking tape? White glue? (that's what I usually use for cooling ducts.. easy to cut, glue, tape..) It's no more flammable than plastic, and it doesn't melt and get soft. Papier Mache, works too. > > On the other hand, if you want to mold a smooth curve, then plastic is the way to go. Vacuforming can make a very nice thing, and the form is made out of wood (usually), but you don't need to go to that extreme.. you get some nice thermoplastic, put it in hot water to get it soft, and mold as needed. (yes, you could use those old LPs you've got stashed away.. ) > > Thin, cuttable plastic could be polyethylene (not necessarily High density) or similar. Polystyrene and acrylic tend to be more brittle. ABS is very nice to work with. PVC is also easy to work with. Nylon is another possibility. > > Do you want to be able to glue it? > > What I would do is call up profesionalplastics.com formerly Cadillac Plastics (many outlets nationwide) and see what they have. It might be more useful to find a retail outlet and go look through their scrap bin.. Before Gem-O-Lite in Woodland Hills went out of business, that's where I used to go. Plastic Depot in Burbank has a huge selection. > > Drive over there, and ask the counter folks what would work for you. $10-20 will get you more plastic than you know what to do with. > > Art supply places (e.g. Blick on Raymond.. any of the countless Michaels or Aaron Bros) also carry sheet plastic, but I find the plastic places tend to have more variety, and more practical information about use for "engineering" applications. > > > Jim Lux > +1(818)354-2075 > >> -----Original Message----- >> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of David Mathog >> Sent: Wednesday, August 31, 2011 10:29 AM >> To: beowulf at beowulf.org >> Subject: [Beowulf] materials for air shroud? >> >> Anybody know of a nice cheap, high melting point, easy to work with >> sheet material, for making a custom air shroud? >> >> We have one box with stuff in it that looks similar to HDPE, the >> material the white flexible cutting boards are made of, but it is a bit >> thinner and more rigid that that. Unfortunately there are no markings >> on it, so HDPE is just a guess. Whatever it is, it cut easily with >> scissors (I had to trim it slightly at one point.) >> >> Background. We have an older Supermicro SC-823 server with dual >> processors. The air shroud it came with only covers the first >> processor. That didn't matter much when it had two low power processors >> in it, but after upgrading it to dual Opteron 280s, the uncovered second >> one runs considerably hotter than the covered front one. (Swapping the >> processors around didn't help - the heat stayed where it was, so a >> ventilation issue, not a processor issue.) Supermicro does make a newer >> shroud which extends to the back of the case, but the manual (google for >> "SC-823 air shroud user's guide") indicates that it is designed for >> Intel CPUs. So it may or may not fit around the Opterons. >> >> The redesigned air shroud will probably work, but I'm about 90% >> confident that taping a sheet of plastic onto the back of the existing >> shroud would work as well - if I can find a plastic that won't flap >> around or melt. >> >> Thanks, >> >> David Mathog >> mathog at caltech.edu >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mathog at caltech.edu Wed Aug 31 17:24:48 2011 From: mathog at caltech.edu (David Mathog) Date: Wed, 31 Aug 2011 14:24:48 -0700 Subject: [Beowulf] materials for air shroud? Message-ID: Robert G. Brown wrote > Also thin aluminum. No way, at least not anywhere near the motherboard. There isn't going to be a way to fasten it very tightly into position, just tape probably, possibly a zip tie at the back end. So it would be best if the shroud cannot short things out or scratch components off the motherboard if it falls out of position. I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for this, and it is similar to the shroud material we have in another server. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From Glen.Beane at jax.org Wed Aug 31 17:42:23 2011 From: Glen.Beane at jax.org (Glen Beane) Date: Wed, 31 Aug 2011 21:42:23 +0000 Subject: [Beowulf] materials for air shroud? In-Reply-To: <4E5EA1EC.7080804@cse.ucdavis.edu> References: , <4E5EA1EC.7080804@cse.ucdavis.edu> Message-ID: On Aug 31, 2011, at 5:05 PM, "Bill Broadley" wrote: > On 08/31/2011 12:15 PM, David Mathog wrote: >> That never crossed my mind. >> >> You sure about the flammability? I believe it for the ignition due to >> temperature (Fahrenheit 451 and all that). However, I have a gut >> feeling (but no data) that sparks are fairly likely to ignite cardboard, >> and less likely to ignite a solid plastic sheet (polyethylene or >> polypropylene, for instance). Not that I'm expecting sparks, but that >> is a real possibility when a power supply fails. Maybe even a brief >> flame. Of course paper won't hold up well compared to plastic if it >> gets wet. Moisture resistance is not important here though - if the >> insides of the computer are dripping, air shroud failure is the least of >> my worries. > > I'm aware of a machine room fire that was attributed to cardboard dust > and the storage of flammable material (paper and cardboard). > I've seen servers shipped with paperboard shrouds directing air over the processors... I won't mention the vendor by name _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Wed Aug 31 17:44:45 2011 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 31 Aug 2011 17:44:45 -0400 (EDT) Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: On Wed, 31 Aug 2011, David Mathog wrote: > Robert G. Brown wrote > >> Also thin aluminum. > > No way, at least not anywhere near the motherboard. There isn't going > to be a way to fasten it very tightly into position, just tape probably, > possibly a zip tie at the back end. So it would be best if the shroud > cannot short things out or scratch components off the motherboard if it > falls out of position. Don't forget the virtue of coat hangers. Even rubber coated ones. If you made the shroud out of aluminum, you could basically paint the bottom with liquid electrical tape (or better, dip it four or five times, drying it in between). It would basically rubber-coat it. No shorting, no scratching, still moderately fireproof. But as you wish. > I'm thinking perhaps 1/16" polypropylene, that may be stiff enough for > this, and it is similar to the shroud material we have in another server. The biggest problem with stuff like this (IIRC a discussion from long ago) is you have to worry about what and how toxic it is in a fire, at least if you want fire-persons to be able to enter the room in a fire. Many plastics burn into really toxic materials. You also have to worry about how it will cope with high heat. The good thing about aluminum is that by the time it melts you won't care. I think some of the liquid tape compounds are fire retardant/melt resistant, and the aluminum itself is such a good conductor of heat that it will act as a heat sink for the rubber coating (in a good way). rgb > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From james.p.lux at jpl.nasa.gov Wed Aug 31 17:56:08 2011 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Wed, 31 Aug 2011 14:56:08 -0700 Subject: [Beowulf] materials for air shroud? In-Reply-To: References: Message-ID: Plastic tape covering the aluminum.. 20 mil "pipe wrap" is useful stuff. 3M VHB double stick foam tape to hold it in place. But, enough of this feeble lash-up idea: I think the real solution is to have a second cluster doing a complete finite element model of the instantaneous temperature distribution within the processor in question, driving a set of actuators to form a dynamically optimized shroud. Or, perhaps the shroud could be made from millimachines implementing very simple control logic, but with an appropriate emergent behavior based on, say, their temperature sensing capability. The millimachines should, of course, be self replicating. Perhaps a suitably genetically engineered extremophile could be created? A second cluster does the model, a third cluster determines the optimum genetic sequence, a fourth cluster is responsible for iteratively doing the bioengineering to create the organisms, etc. (or for a less biologically inspired system, the third and fourth clusters are doing some form of adaptive evolving micro manufacturing) I'd provide more details, but really, that's just engineering, and is obvious to a skilled practitioner. (for those not at CalTech (who is my employer, as well as David's), you can contact their patent counsel for rights to the invention disclosed above, which I'm sure they'll be happy to license to you and reasonable and non-discriminatory terms.) > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > If you made the shroud out of aluminum, you could basically paint the > bottom with liquid electrical tape (or better, dip it four or five > times, drying it in between). It would basically rubber-coat it. No > shorting, no scratching, still moderately fireproof. But as you wish. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.