简体   繁体   中英

Kolmogorov-Smirnov 2 Sample Test Java give 0 p-value

I am using the Apache Commons Math Kolmogorov-Smirnov test in order to determine if the sample my RNG is producing is a Uniform Distribution.

I am using UniformIntegerDistribution to produce a uniform distribution and I get a sample of 2000000 integers. Then I put them into a double[]

I produce from my RNG 2000000 numbers as well and put them in a double[] .

I have plotted the sample and I see it is uniform but the KS test gives me a p-value of 0.0 which would indicate that the null hypothesis of the two being drawn from the same distribution (ie Uniform) is invalid. Meaning that my RNG sample is not conforming to a uniform distribution.

double alpha = test.kolmogorovSmirnovTest(a, b); give me alpha = 0.0

And the method's Javadoc reads:

Computes the p-value, or observed significance level, of a two-sample Kolmogorov-Smirnov test evaluating the null hypothesis that x and y are samples drawn from the same probability distribution.

So I would expect the p-value to be high given that the I see the plot to be clearly uniform.

    IntegerDistribution uniform = new UniformIntegerDistribution(1, 81);

    ArrayList<Integer> lis = new ArrayList<>();
    int i = 0;
    while (i < 100000) {

        //Creates a list of 20 numbers ε [1,80]
        List<Integer> l = ls.createRandomNumbersInclusive(80, 20);
        lis.addAll(l);
        Assertions.assertFalse(l.stream().anyMatch(it -> it > 80));
        Assertions.assertFalse(l.stream().anyMatch(it -> it < 1));

        i++;
    }

    KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();

    var sample = uniform.sample(2000000);

    List<Integer> ll = new ArrayList<>();
    double[] a = new double[2000000];

    for(var j = 0; j<2000000; j++) {
        a[j] = sample[j];
    }

    double[] b = lis.stream().map(it -> Double.valueOf(it)).mapToDouble(Double::doubleValue).toArray();

    var alpha = test.kolmogorovSmirnovTest(a, b); 

    System.out.println("Alpha "+ alpha); //This gives me 0.0

     /** I am doing the below to get the count per numbers [1,80] and plot them.
     * I see them being uniform 
     * 1 ===
     * 2 ===
     *  ...
     * 80 === 
     */
     Map<Integer, Long> result = lis.stream().collect(Collectors.groupingBy(it -> it, Collectors.counting()));

What worries me is that if I create a new UniformIntegerDistribution and get a sample2 and then put this in the test.kolmogorovSmirnovTest(a, b); , I indeed get a p-value close to 1 which is what I expect.

I am either doing something wrong with Java or there is something in the numbers produced by the RNG that I am not getting.

The code for the createRandomNumbersInclusive is

public List<Integer> fetchNumberList(final int drawNumberMin, final int drawNumberMax, final int drawNumberCount) {

    final List<Integer> range = new ArrayList<Integer>();
    for (int i = drawNumberMin; i <= drawNumberMax; i++) {
        range.add(i);
    }

    Collections.shuffle(range, rng);

    return new ArrayList<Integer>(range.subList(0, drawNumberCount));
}

And the RNG is rng = SecureRandom.getInstance("NativePRNGNonBlocking");

I found the reason behind the problem. The UniformRealDistribution which I had used initially as this works with kolmogorovSmirnovTest(RealDistribution distribution, double[] data)

For some reason however, UniformIntegerDistribution is inclusinve-exclusive.

When I changed IntegerDistribution uniform = new UniformIntegerDistribution(1, 81); to IntegerDistribution uniform = new UniformIntegerDistribution(1, 80); it worked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM