I’d always take 6 bits and shuffle a deck of 64 cards. I’d then do an O(n) pass and only keep cards [1..52].

This is short and there’s not “many ways to get it wrong” if I start by copy/pasting a correct Fisher-Yates.

That’s an average of 64*6 / 52 bits / card. Hardly the best possible but quite fast and very simple.

]]>Let’s have N=ceil(log_2(x)) (the smallest n such that 2^n>=x).

The simplest solution is taking N random bits for the source, and concatenate them. We get some random integer Y, uniformly distributed over [0;2^N[, and the successive values it takes are independent (since consecutive bits are independent).

If it is lesser than Y, we output it; if not, we compute another Y, rinse and repeat.

However, on average, you need (2^N)/x trials (which is between 1 and 2), using N(2^N)/x bits, when log_2(x) is (theoretically) enough.

For the ‘a..e’ example, we use 3*8/5=4.8 bits, when only ≃2.32 are required, “wasting” about 51% of the random bits we take.

However, we can improve on that by producing k numbers in [0;x[ at the same time :

We use the previous method to produce a number Y in [0; x^k[.

You write it down in base x (Y_0 = Y mod x, Y_1 = (Y/x) mod x, … Y_(k-1) = Y/x^k), producing your k random numbers.

It’s fairly easy to show they are independent, uniformly distributed, since Y is taken uniformly in [0; x^k[.

To produce we used M*2^M/(k*(x^k)) bits per number, with M = ceil(k*log_2(x)).

On the ‘a..e’ example, for k=3, we use a bit less than 2.39 bit per number generated, wasting less than 3% of generated bits, which is a huge improvement.

Intuitively, I chose k so that x^k=125 is close (but lesser than) a power of 2 (here, 128).

]]> >> Since the matrix is NxN = 25, we would expect

>> each element of the matrix to be approximately

>> 5,000,000 / 25 = 200,000.

Each element of the matrix will contain approximately

NumTrials / N

elements. Not

NumTrials / N^2

Why?

]]>Maintain a “pool” of randomness. At any time, the pool should have a range n and a current value v, 0<=v<n, such that v is with equal probability any of the possible values, and independent of any previous values “output” by the pool. We will always restore this invariant before returning from a method call. Initially, the pool has n=1 and v=0, and the invariant can be seen trivially to hold. We call this state the empty state.

You can increase the range of the pool by drawing on your random source of 0s and 1s. Each time you do so n becomes n * 2 and v becomes v * 2 + b, where b is the random 0 or 1. The range is doubled and the value is still with equal probability any of the possible values.

With this setup, suppose you want a random value q, 0<=q<x. A crude approach is simply to keep increasing the range of the pool until n>=x. At this point, if v<=x, take it as your value and empty the pool, otherwise empty the pool and start over. However, if the source of random numbers is costly this is wasteful, as it may discard lots of useful entropy and result in a lot of re-rolls. The key insight is that in both scenarios it may be possible to leave some value in the pool without compromising the invariant that v is always with equal probability any value such that 0<=v<n, and independent of previous output.

First of all, suppose that the pool has a large range n and x is small. Choose some value k such that k*x<=n. Now so long as v<k*x, we can decompose it (using integer division and remainder) into two independent random values, one of them q, 0<q<=x and the other v’, 0<v'<=k. q is our return value, v’ is the new value for v and k is the new value for n. This restores our invariant and doesn’t leave the pool empty (unless k=1, of course). We should always choose as large a value of k as possible.

Secondly, consider the case when v>=k*x. Within this branch, we need to try again, but we needn’t completely empty the pool. We now know that v is with equal probability any value >=k*x and <n. We can decrement both n and v by k*x and restore our invariant that v is with equal probability any value >=0 and <n, independent of previous output. In the worst case, k*x=n-1, in which case n becomes 1 and v becomes 0 – the pool is empty. As before, we continue by refilling the pool and trying again.

This pool can be used with the Fischer-Yates algorithm for shuffling whatever you want. I think it avoids introducing any bias from rounding. Principally, I think it does a good job at minimizing the number of calls to the underlying random number source, so long as the pool is always topped up to keep n much larger than x. I believe the cost of refilling the pool on a miss rises more slowly than the probability of misses falls as you increase the minimum size of the pool.

]]>