简体   繁体   English

生成范围内的随机整数以满足java中的百分位数

[英]Generating Random integers within a range to meet a percentile in java

I am trying to generate random integers within a range to sample a percentile of that range. 我试图在一个范围内生成随机整数来抽样该范围的百分位数。 For example: for range 1 to 100 I would like to select a random sample of 20%. 例如:对于范围1到100,我想选择20%的随机样本。 This would result in 20 integers randomly selected for 100. 这将导致随机选择20个整数为100。

This is to solve an extremely complex issue and I will post solutions once I get this and a few bugs worked out. 这是为了解决一个非常复杂的问题,一旦我得到这个问题,我会发布解决方案并解决一些问题。 I have not used many math packages in java so I appreciate your assistance. 我没有在java中使用过很多数学包,所以感谢您的帮助。

Thanks! 谢谢!

Put all numbers in a arraylist, then shuffle it. 将所有数字放入arraylist,然后将其洗牌。 Take only the 20 first element of the arraylist: 只采用arraylist的20个第一个元素:

ArrayList<Integer> randomNumbers = new ArrayList<Integer>();

for(int i = 0; i < 100; i++){
    randomNumbers.add((int)(Math.random() * 100 + 1));
}

Collections.shuffle(randomNumbers);

//Then the first 20 elements are your sample

If you want 20 random integers between 1 and one hundred, use Math.random() to generate a value between 0 and 0.999... Then, manipulate this value to fit your range. 如果需要20个1到100之间的随机整数,请使用Math.random()生成0到0.999之间的值...然后,操纵此值以适合您的范围。

int[] random = new int[20];
for(int i =0; i< random.length;i++)
{
   random[i] = (int)(Math.random()*100+1); 
}

When you multiply Math.random() by 100, you get a value between 0 and 99.999... To this number you add 1, yielding a value between 1.0 and 100.0. 当您将Math.random()乘以100时,您将获得介于0和99.999之间的值...对于此数字,您将添加1,从而产生介于1.0和100.0之间的值。 Then, I typecasted the number to an integer by using the (int) typecast. 然后,我使用(int)类型转换将数字转换为整数。 This gives a number between 1 and 100 inclusive. 这给出了1到100之间的数字。 Then, store the values into an array. 然后,将值存储到数组中。

If you are willing to go with Java 8, you could use some features of lambdas. 如果你愿意使用Java 8,你可以使用lambdas的一些功能。 Presuming that you aren't keeping 20% of petabytes of data, you could do something like this (number is the number of integers in the range to get) it isn't efficient in the slightest, but it works, and is fun if you'd like to do some Java 8. But if this is performance critical, I wouldn't recommend it: 假设您没有保留20%的PB级数据,您可以执行类似这样的操作(数字是要获得的范围内的整数),它没有丝毫的效率,但它有效,如果你想做一些Java 8.但如果这对性能至关重要,我不推荐它:

public ArrayList<Integer> sampler(int min, int max, int number){
    Random random = new Random();
    ArrayList<Integer> generated = new ArrayList<Integer>();
    IntStream ints = random.ints(min,max);
    Iterator<Integer> it = ints.iterator();
    for(int i = 0; i < number; i++){
       int k = it.next();
       while(generated.contains(k)){
           k = it.next();
       }
       generated.add(k);
    }
    ints.close();
    return generated;
}

If you really need to scale to petabytes of data, you're going to need a solution that doesn't require keeping all your numbers in memory. 如果您确实需要扩展到数PB的数据,那么您将需要一个不需要将所有数字保存在内存中的解决方案。 Even a bit-set, which would compress your numbers to 1 byte per 8 integers, wouldn't fit in memory. 即使是一个比特集,它会将你的数字压缩为每8个整数1个字节,也不适合内存。

Since you didn't mention the numbers had to be shuffled (just random), you can start counting and randomly decide whether to keep each number or not. 由于你没有提到数字必须改组(只是随机),你可以开始计数并随机决定是否保留每个数字。 Then stream your result to a file or wherever you need it. 然后将结果流式传输到文件或任何需要的位置。

Start with this: 从这开始:

    long range = 100;
    float percentile = 0.20f;
    Random rnd = new Random();
    for (long i=1; i < range; i++) {
        if (rnd.nextFloat() < percentile) {
            System.out.println(i);
        }
    }

You will get about 20 percent of the numbers from 1 to 100, with no duplicates. 您将从1到100获得大约 20%的数字,没有重复。

As the range goes up, the accuracy will too, so you really wouldn't need any special logic for large data sets. 随着范围的增加,精度也会提高,因此对于大型数据集,您真的不需要任何特殊逻辑。

If an exact number is needed, you would need special logic for smaller data sets, but that's pretty easy to solve using other methods posted here (although I'd still recommend a bit set). 如果需要一个确切的数字,你需要特殊的逻辑来处理较小的数据集,但是使用这里发布的其他方法很容易解决(尽管我仍然建议使用一些设置)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM