简体   繁体   中英

Generating random integers uniformly in log space

I want to generate random integers which are uniformly distributed in log space . That is, the log of the values of will be uniformly distributed.

A normal uniformly distributed unsigned int will have 75% of its magnitudes above 1 billion, and something like 99.98% above 1 million, so small values are underrepresented. A uniform value from log space would have the same number of values in the range 4-8, as 256-512, for example.

Ignoring negative values for now, one way I can think of is something like:

Random r = new Random();
return (int)Math.pow(2, r.nextDouble() * 31);

That should generate a 31-bit log-uniformly distributed. It's not going to be fast though, with an pow() operation in there and to introduce floating point values to generate integers is a bit of a smell. Furthermore, a lot of the range of double is lost by Random.nextDouble() and it is not clear to me if this code can even generate all 2^31-1 positive integer values.

Better solutions welcome.


There are two similar solutions below which both involve filling the integer with random bits, then shifting a random number of bits to the right. Something like:

int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);

This has two types of bias:

Step-wise bias

This produces sort of a stepwise log distributed value, not a smooth one. In particular, the right shift by a random value in [0,31], means there are 31 equally probable "sizes" of integers, and every value in that range is equally probable. Since there are 2^N values in range N, the values in one range are twice as probable as the ones in the next - so you get log behavior between the ranges, but the ranges themselves are flat.

I don't know of an easy way to get rid of this bias.

Top bit bias

A second form of bias occurs because the MSB is not always 1 (eg, even a shift amount of 10, doesn't necessary produce a 31-10=21 bit value, there is an additional distortion. In effect, the ranges overlap. The value 1 is not just present (with p(1)=.5) for a shift amount of 30, but also for shifts of 29 (p(1)=0.25), 28 (p(1)=.125), and so on. That effect cancels out for smaller values (ie, if you look at shift amounts of 30 and 29 only, 1 seems like it is 3x more likely than 2, rather than the predicted value of 2x, but once you look at more values it converges. It doesn't cancel out for large values, however, which is why you see the 20:32207 bucket be smaller than the others in @sprinter's answer.

I think this form of bias can pretty easily be removed simply by forcing the top bit to zero, so something like:

(r.nextInt(0x40000000) | 0x40000000) >> r.nextInt(31)

This has a couple of other tweaks - it a max of 2^30 for the rand, which is faster (special case for powers of 2 in nextInt(int) code), since we never want the second-from-MSB bit set anyway (we force it to 1). This also eliminates a microscopic additional source of bias which is that Integer.MAX_VALUE could never be generated, so one value is missing from full representation.

It shifts by [0,31) bits so you never get zero, if you want zeros too, change that to shift by [0,32) bits and you'll get zeros equal in frequency to ones (technically not log-distributed anymore, but useful in many cases). Another approach is to subtract one from the final value to get zeros (at the cost of never getting Integer.MAX_VALUE).

Incorrect answer provided for information only. This does not satisfy OP's requirements for the reasons given in the question.

int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);

My informal test of that seems to indicate there is the expected skew. I generated 1M numbers this way and had the following distribution of the log (ignoring zeros)

0:46819
1:47045
2:40663
3:44001
4:45306
5:43802
6:46447
7:43355
8:47366
9:42747
10:46387
11:43899
12:45179
13:45496
14:44431
15:46751
16:43055
17:47127
18:41243
19:41837
20:32207
21:11965

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM