[Beowulf] OT: public random numbers?
Vincent Diepeveen
diep at xs4all.nl
Fri Aug 26 14:17:46 EDT 2011
On Aug 26, 2011, at 10:43 AM, Shawn Hood wrote:
> I hate to troll, but...
>
> On Aug 25, 2011, at 8:27 PM, Vincent Diepeveen <diep at xs4all.nl> wrote:
>
>>
>> On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:
>>
>>> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>>>
>>>> I noticed that most generated semi-random numbers with software
>>>> generators,
>>>> had the habit to truely adress a search space of n always in O (n
>>>> log n).
>>>>
>>>> So if you draw from most software RNG's a number and do it
>>>> modulo n,
>>>> with n being not too tiny, say quite some millions or even
>>>> billions, then every
>>>> slot in your 'hashtable' will get hit at least once by the RNG,
>>>> whereas data
>>>> in reality simply happens to not have that habit simply.
>>>>
>>>> So true random numbers versus generated noise is in this manner
>>>> easy
>>>> to distinguish by this. Now i didn't study literature whether some
>>>> other chap
>>>> some long time ago already had invented this. That would be most
>>>> interesting
>>>> to know.
>>>
>>> Some other chap named George Marsaglia (and to some extent another
>>> chap
>>> named Donald Knuth) have already invented this. A number of
>>> tests of
>>> the tails of random number generators are already in dieharder. All
>>> "good" modern rngs pass these tests.
>>>
>>> The Martingale betting system you are looking at is even older (at
>>> least
>>> Marsaglia and Knuth are still alive). It dates back to the 18th
>>> century, and is well known to be flawed for a variety of reasons,
>>> not
>>> the least of which is that gamblers don't have the infinite wealth
>>> necessary to make this >>even<< a zero-sum strategy and casinos have
>>
>> From mathematical viewpoint it makes perfect cash.
>> As statistica odds is you already have build up considerable profit
>> when a worst case (that you hit the 10 times practical double limit)
>> hits you.
>
> A betting system will not improve the negative mathematical
> expectation of a casino game.
the doubling system doesn't have a negative expectation.
You are allowed to double 10 times practical if you start with 1.
Of all systems in roulette this is the only system that will produce
a profit, just theoretical spoken,
practice we all agree. they kick you out.
> If your mathematical expectation is -1 for each trial, it's -10
> for ten trials. You will not win in the long-run using Martingale.
>
Except that this system doesn't have a negative expectation. it has a
positive expectation.
There is no other system in roulette that has a positive expectation,
other than the doubling system.
Please use European Casino model. I don't live in the USA.
>>
>> The simulations are of course using the practical limit.
>>
>> Note that the European casino's have a single zero.
>> In USA there is even more greedy mafia controlling all the casino's,
>> there are 2 zero's there. 0 and 00.
>>
>> The simulations were for European casino's.
>>
>>> betting limits that de facto make it impossible to pursue the
>>> requisite
>>> number of steps and in roulette in particular have 0 and/or 00
>>> slots and
>>> aren't zero-sum to begin with. You can read a decent analysis of
>>> outcomes based on the presumed binomial distribution of a zero-sum
>>> game
>>> here:
>>>
>>> http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>>>
>>
>> You're not allowed to use a system in a casino, so we speak about
>> theory. Probably first evening they let you try. Second day you'll
>> get on the blacklist.
>
> Nonsense. Have you ever been to a casino?
> You are welcome to Martingale all day long at any of them.
> Hell, I'll buy a roulette wheel and you can come over to my place
> if you play this strategy or any if its variants.
> The casino wants you to Martingale -- it's favorable to them.
> Why would they stop a loser?
The doubling system in all casino's if you'd apply to it in an
objective manner and would be allowed to - it makes a profit.
Same for some slot machines over there. After some others played on
it and it swallowed money - then majority of slot machines
are not negative sum games anymore. If you play on them then, it's a
positive sum game.
If it would be always negative sum games then no lady would keep
playing slot machines.
>
> The casino is not concerned with betting strategies. It is
> concerned with folks gaining an edge. A betting system alone will
> not give the player an edge.
>
No very wrong, a casino is interested in maximizing its profit.
Kicking out folks that do well is part of that game.
Oh by the way - I worked for a casino. Did you?
>>
>>> Your test below is interesting, though. The only real problems I
>>> can
>>> see with actually using it in dieharder are:
>>>
>>
>> Yeah more interesting than the billion times discussed roulette
>> system which
>> has been analyzed completely flat.
>>
>>> a) One would need a theoretical estimate of the distribution of
>>> filling given n log n draws on an n-slotted table (for largish n).
>>> That
>>> is, for a perfect rng, what SHOULD the distribution of success/
>>> failure
>>> be.
>>
>> As we figured out by now in Artificial Intelligence the statistical
>> assumptions made in the past they simply do not hold.
>>
>> For Artificial Intelligence we need a new sort of theoretical theory.
>>
>> As for the distribution problem, generatiors having a spread that's
>> too accurate,
>> the way to deliver a proof would be for example build a simple
>> device.
>>
>> Build an old fashioned box where you can draw balls. Remember what
>> you coud
>> see on TV some 20 years ago or so (not sure it was like that in USA).
>>
>> A big basked with balls. The basket, in fact it's looking like this:
>>
>> http://www.rateyours.com/blog/uploaded_images/
>> lottery_machine-727064.jpg
>>
>> But now a much bigger machine like this with inside different means
>> of randomizing the balls,
>> actually also randomly modifying the inside obstacles of shaking of
>> the balls.
>>
>> After a ball has been drawn you automatically have it annotated and
>> the ball immediately goes back
>> into the machine. For a full minute you have the balls in the machine
>> shaken again and you draw
>> again a ball. It is important to do this randomizing of the balls
>> inside the machine for quite some time.
>> I would propose a minute.
>>
>> Of course you have to do this with quite some balls. Say a thousand.
>>
>> Then you draw balls until all numbers have been drawn at least once.
>>
>> This cool experiment can be easily build. Of course the expected
>> running time of a single experiment
>> will be a few weeks.
>>
>> You can produce a number of those drawing machines though and have a
>> look.
>>
>> Theories that seemingly work for small n, n being the number of
>> balls,
>> are much harder to maintain at bigger n's, as we also see in prime
>> number research.
>>
>> The way how the machine gets designed of course is total crucial. I
>> would propose a design that
>> really shakes the balls really a lot through each other and really
>> very thoroughly.
>>
>> Just like we nowadays know how flawed a big number of card shaking
>> machines are that are popular to use.
>>
>> Such a lottery with realy a lot of balls would be very interesting to
>> see the outcomes from.
>>
>> In fact i would prefer having produced number of those machines, so
>> that it's possible to really have a lot of outcomes
>> and then analyze them very well.
>>
>>>
>>> b) One would then need the CDF for this distribution, to be able to
>>> turn the results of N trials (of n log n pulls each) into a p-value
>>> under the null hypothesis -- the probability of obtaining the
>>> particular
>>> number of successes and failures presuming a perfectly random
>>> generator.
>>>
>>> That way dieharder could apply it rigorously to its 70 or 80
>>> embedded
>>> rngs or to any user's outboard generator. There probably is
>>> theoretical
>>> statistical support for the PD and/or CDF -- you're analyzing the
>>> tails
>>> of a poissonian process -- but finding it or doing it yourself (or
>>> myself), aye, that's the rub. One cannot just say "high degree of
>>> certainty that it is an RNG" (by which one means that the rng in
>>> question fails the test for randomness) in the test. HOW high?
>>> Perfect
>>> rngs or perfectly random processes will sometimes fill your
>>> table, but
>>> how often?
>>
>> If we assume that reality of life represents randomness, which is
>> another
>> rather good question in how far that theory is plausible, then using
>> that
>> assumption i'm very sure that the RNG's i investigated so far
>> have a distribution which is too perfect, more perfect than i have
>> seen
>> in any reality.
>>
>> In fact most RNG's fill all slots faster than O ( n log n ), yet it's
>> O ( n log n )
>> that they follow.
>>
>> This is RNG's that have come through all tests as being a good and
>> very acceptabe RNG to be used.
>>
>> Realize i'm no RNG expert, so all the names of all those tests.
>>
>> For me it's just push button technology. I just designed a test
>> and found it very odd that all RNG's have such perfect distributions
>> that they don't even miss a single slot.
>>
>> I'd argue the only test that would be interesting to me to see how it
>> might be in reality is the lottery machine test - yet with really
>> a lot
>> of balls. I'd prefer 10k balls over a 1000 in fact - yet for
>> practical
>> reasons i would agree with a number of above a 1000.
>>
>> Paper fiddling is really not interesting to me there to prove
>> anything,
>> as what i've seen in reality in randomness is total different from
>> how
>> RNG's model that.
>>
>> Regards,
>> Vincent
>>
>>
>>> How can you differentiate an "accident" when one does from
>>> an actual failure? All of those questions require a more rigorous
>>> theory and quantitative result embedded in a test that can be
>>> systematically cranked up to more clearly resolve failures until
>>> they
>>> are unambiguous, not marginal maybe yes maybe no.
>>>
>>> I suspect that the failures this test would reveal are already more
>>> than
>>> covered in dieharder, in particular by the bit distribution tests
>>> and
>>> the monkey tests, but I'm not terribly happy with the monkey
>>> tests and
>>> would be perfectly thrilled to have a simpler to compute test that
>>> revealed precisely this sort of flaw, systematically. And it
>>> doesn't
>>> hurt at all to have partially or fully redundant tests as long as
>>> the
>>> test themselves are rigorously valid. If you can find or compute
>>> the
>>> CDF for your test below, I'd be happy to wrap it up and add it to
>>> dieharder, in other words. One can always SIMULATE a CDF, of
>>> course,
>>> but that requires a known good generator and sort of begs the
>>> question
>>> if you don't think that e.g. AES or threefish or KISS are good
>>> generators that would actually pass your test.
>>>
>>> Even hardware/quantum sources of random bits are suspect -- they
>>> often
>>> are generated by a process that leaves in the traces of an
>>> underlying
>>> distribution. I'm not convinced that >>any<< process in the real
>>> world
>>> is >>truly<< random. Physics is ambiguous on the issue -- the
>>> quantum
>>> description of a closed system is just as deterministic as the
>>> classical
>>> one, and Master equation unpredictability on open subsets of a large
>>> closed system reflects entropy/ignorance, not actual randomness
>>> (hence
>>> Einstein's famous "doesn't play dice" remark). But lots of this are
>>> sufficiently random that one cannot detect any failure of
>>> randomness,
>>> modern crypto class generators being a prime example.
>>>
>>> rgb
>>>
>>>>
>>>> In semi pseudo code, let's take an array of size a billion as an
>>>> example,
>>>> though usually a few million is more than ok:
>>>>
>>>> n = 2^30; // 2 to the power 30
>>>>
>>>> Function TestNumbersForRandomness(RNG,n) {
>>>> declare array hashtable[size n];
>>>>
>>>> guessednlogn = 2 * (log n / log 2) * n;
>>>>
>>>> for( i = 0 ; i < n ; i++ )
>>>> hashtable[i] = FALSE;
>>>>
>>>> ndraws = filledn = 0;
>>>> while( ndraws < guessednlogn ) {
>>>> randomnumber = RNG();
>>>> r = randomnumber % n; // randomnumber = r (mod n)
>>>> if( hashtable[r] == FALSE ) {
>>>> hashtable[r] = TRUE;
>>>> filledn++;
>>>> if( filledn >= n )
>>>> break;
>>>>
>>>> }
>>>> ndraws++;
>>>> }
>>>>
>>>> if( filledn >= n )
>>>> print "With high degree of certainty data generated by a RNG\n");
>>>> else
>>>> print "Not so sure it's a RNG\n";
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Vincent
>>>>
>>>>
>>>>
>>>>
>>>>> -- both unpredictable and
>>>>> flat/decorrelated at all orders, and even though there aren't
>>>>> really
>>>>> enough of them for my purposes, I've used them as one of the
>>>>> (small)
>>>>> "gold standard" sources for testing dieharder even as I test
>>>>> them. For
>>>>> all practical purposes threefish or aes are truly random as
>>>>> well and
>>>>> they are a lot faster and easier to use as gold standard
>>>>> generators,
>>>>> though.
>>>>> I don't quite understand why the single site restriction is
>>>>> important --
>>>>> this site has been up for years and I don't expect it to go away
>>>>> soon;
>>>>> it is quite reliable. I don't think there is anything secret
>>>>> about how
>>>>> the numbers are generated, and I'll certify that the numbers it
>>>>> produces
>>>>> don't make dieharder unhappy. So 1 is fixable with a bit of
>>>>> effort on
>>>>> your part; 6 I don't really understand but the guy who runs the
>>>>> site is
>>>>> clearly willing to construct a custom feed for cash customers, if
>>>>> there
>>>>> is enough value in whatever it is you are trying to do to pay for
>>>>> access. If it's just a lottery, well, lord, I can think of a
>>>>> dozen ways
>>>>> to make numbers so random that they'd be unimpeachable for any
>>>>> sort of
>>>>> lottery, both unpredictable and uncorrelated, and they don't any
>>>>> of them
>>>>> require any significant amount of entropy to get started.
>>>>> I will add one warning -- "randomness" is a rather stringent
>>>>> mathematical criterion, and is generally tested against the null
>>>>> hypothesis. Amateurs who want to make random number generators
>>>>> out of
>>>>> supposedly "random" data streams or fancy algorithms almost
>>>>> invariably
>>>>> fail, sometimes spectacularly so. There are a half dozen or more
>>>>> really, really good pseudorandom number generators out there and
>>>>> it is
>>>>> easy to hotwire them together into an xor-based high entropy
>>>>> stream that
>>>>> basically never repeats (feeding it a bit of real entropy now and
>>>>> then
>>>>> as it operates). I would strongly counsel you against trying to
>>>>> take
>>>>> e.g. weather data and make something "random" out of it.
>>>>> Unless you
>>>>> really know what you are doing, you will probably make something
>>>>> that
>>>>> isn't at all random and may not even be unpredictable. Even most
>>>>> sources of "quantum" randomness (which is at least possibly "truly
>>>>> random", although I doubt it) aren't flat, so that they carry the
>>>>> signature of their generation process unless/until you manage to
>>>>> transform them into something flat (difficult unless you KNOW the
>>>>> distribution they are producing). Pseudorandom number generators
>>>>> have
>>>>> the serious advantage of being amenable to at least some
>>>>> theoretical
>>>>> analysis (so you can "guarantee" flatness out to some high
>>>>> dimensionality, say) as well as empirical testing with e.g.
>>>>> dieharder.
>>>>> HTH,
>>>>>
>>>>> rgb
>>>>>> Thanks,
>>>>>> David Mathog
>>>>>> mathog at caltech.edu
>>>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>>>> Robert G. Brown http://
>>>>> www.phy.duke.edu/~rgb/
>>>>> Duke University Dept. of Physics, Box 90305
>>>>> Durham, N.C. 27708-0305
>>>>> Phone: 1-919-660-2567 Fax: 919-660-2525
>>>>> email:rgb at phy.duke.edu
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>>> Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> Robert G. Brown http://www.phy.duke.edu/
>>> ~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
>>>
>>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf
mailing list