繁体   English   中英

与 Python3 numpy.random.rand 计算的 C++ 中的随机数相同

[英]Same random numbers in C++ as computed by Python3 numpy.random.rand

我想在 C++ 中复制一些已经在 Python3 中实现的代码的测试,这些代码依赖于numpy.random.randrandn值以及特定的种子(例如, seed = 1 )。

我知道 Python 的随机实现是基于 Mersenne twister 的。 C++ 标准库也在std::mersenne_twister_engine中提供了这个。

C++ 版本返回一个无符号整数,而 Python rand 是一个浮点值。

有没有办法在 C++ 中获得与在 Python 中生成的值相同的值,并确保它们相同? randn生成的数组也是如此?

对于 integer 值,您可以这样做:

import numpy as np

np.random.seed(12345)
print(np.random.randint(256**4, dtype='<u4', size=1)[0])
#include <iostream>
#include <random>

int main()
{
    std::mt19937 e2(12345);
    std::cout << e2() << std::endl;
}

两个片段的结果都是 3992670690


通过查看rand源代码,您可以通过这种方式在 C++ 代码中实现它:

import numpy as np

np.random.seed(12345)
print(np.random.rand())
#include <iostream>
#include <iomanip>
#include <random>

int main()
{
    std::mt19937 e2(12345);
    int a = e2() >> 5;
    int b = e2() >> 6;
    double value = (a * 67108864.0 + b) / 9007199254740992.0;
    std::cout << std::fixed << std::setprecision(16) << value << std::endl;
}

两个随机值都是 0.9296160928171479


使用std::generate_canonical会很方便,但它使用另一种方法将 Mersenne twister 的 output 转换为双倍。 它们不同的原因可能是generate_canonical比 NumPy 中使用的随机生成器更优化,因为它避免了昂贵的浮点运算,尤其是乘法和除法,如源代码所示。 然而,它似乎依赖于实现,而 NumPy 在所有平台上产生相同的结果。

double value = std::generate_canonical<double, std::numeric_limits<double>::digits>(e2);

这不起作用并产生结果 0.8901547132827379,这与 Python 代码的 output 不同。

为了完整性和避免重新发明轮子,这里是 C++ 中 numpy.rand 和 numpy.randn 的实现

header 文件:

#ifndef RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#define RANDOMNUMGEN_NUMPYCOMPATIBLE_H

#include "RandomNumGenerator.h"
    
//Uniform distribution - numpy.rand
class RandomNumGen_NumpyCompatible {
public:
    RandomNumGen_NumpyCompatible();
    RandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);

    std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
    std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
    void seed(std::uint_fast32_t seed);
    void discard(unsigned long long);      // NOTE!!  Advances and discards twice as many values as passed in to keep tracking with Numpy order
    uint_fast32_t operator()();            //Simply returns the next Mersenne value from the engine
    double getDouble();                    //Calculates the next uniformly random double as numpy.rand does

    std::string getGeneratorType() const { return "RandomNumGen_NumpyCompatible"; }

private:
    std::mt19937 m_mersenneEngine;
};

///////////////////

//Gaussian distribution - numpy.randn
class GaussianRandomNumGen_NumpyCompatible {
public:
    GaussianRandomNumGen_NumpyCompatible();
    GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);

    std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
    std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
    void seed(std::uint_fast32_t seed);
    void discard(unsigned long long);      // NOTE!!  Advances and discards twice as many values as passed in to keep tracking with Numpy order
    uint_fast32_t operator()();            //Simply returns the next Mersenne value from the engine
    double getDouble();                    //Calculates the next normally (Gaussian) distrubuted random double as numpy.randn does

    std::string getGeneratorType() const { return "GaussianRandomNumGen_NumpyCompatible"; }

private:
    bool m_haveNextVal;
    double m_nextVal;
    std::mt19937 m_mersenneEngine;
};

#endif

和实施:

#include "RandomNumGen_NumpyCompatible.h"

RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible()
{
}

RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_mersenneEngine(seed)
{
}

void RandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
    m_mersenneEngine.seed(newSeed);
}

void RandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
    //Advances and discards TWICE as many values to keep with Numpy order
    m_mersenneEngine.discard(2*z);
}

std::uint_fast32_t RandomNumGen_NumpyCompatible::operator()()
{
    return m_mersenneEngine();
}

double RandomNumGen_NumpyCompatible::getDouble()
{
    int a = m_mersenneEngine() >> 5;
    int b = m_mersenneEngine() >> 6;
    return (a * 67108864.0 + b) / 9007199254740992.0;
}

///////////////////

GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible()
: m_haveNextVal(false)
{
}

GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_haveNextVal(false), m_mersenneEngine(seed)
{
}

void GaussianRandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
    m_mersenneEngine.seed(newSeed);
}

void GaussianRandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
    //Burn some CPU cyles here
    for (unsigned i = 0; i < z; ++i)
        getDouble();
}

std::uint_fast32_t GaussianRandomNumGen_NumpyCompatible::operator()()
{
    return m_mersenneEngine();
}

double GaussianRandomNumGen_NumpyCompatible::getDouble()
{
    if (m_haveNextVal) {
        m_haveNextVal = false;
        return m_nextVal;
    }

    double f, x1, x2, r2;
    do {
        int a1 = m_mersenneEngine() >> 5;
        int b1 = m_mersenneEngine() >> 6;
        int a2 = m_mersenneEngine() >> 5;
        int b2 = m_mersenneEngine() >> 6;
        x1 = 2.0 * ((a1 * 67108864.0 + b1) / 9007199254740992.0) - 1.0;
        x2 = 2.0 * ((a2 * 67108864.0 + b2) / 9007199254740992.0) - 1.0;
        r2 = x1 * x1 + x2 * x2;
    } while (r2 >= 1.0 || r2 == 0.0);

    /* Box-Muller transform */
    f = sqrt(-2.0 * log(r2) / r2);
    m_haveNextVal = true;
    m_nextVal = f * x1;
    return f * x2;
}

经过一些测试后,当 C++ 无符号整数除以无unsigned int整数的最大值时,这些值似乎在公差范围内(参见下面的@fdermishin 评论),如下所示:

  #include <limits>
  ...
  std::mt19937 generator1(seed);  // mt19937 is a standard mersenne_twister_engine
  unsigned val1 = generator1();
  std::cout << "Gen 1 random value: " << val1 << std::endl;
  std::cout << "Normalized Gen 1: " << static_cast<double>(val1) /  std::numeric_limits<std::uint32_t>::max() << std::endl;

但是,Python 的版本似乎跳过了所有其他值。 给定以下两个程序:

#!/usr/bin/env python3

import numpy as np

def main():

    np.random.seed(1)
    
    for i in range(0, 10):
        print(np.random.rand())

###########

# Call main and exit success
if __name__ == "__main__":
    main()
    sys.exit()

#include <cstdlib>
#include <iostream>
#include <random>
#include <limits>

int main()
{
    unsigned seed = 1;

    std::mt19937 generator1(seed);  // mt19937 is a standard mersenne_twister_engine
    for (unsigned i = 0; i < 10; ++i) {
        unsigned val1 = generator1();
        std::cout << "Normalized, #" << i << ": " << (static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max()) << std::endl;
    }

    return EXIT_SUCCESS;
}

Python 程序打印:

0.417022004702574
0.7203244934421581
0.00011437481734488664
0.30233257263183977
0.14675589081711304
0.0923385947687978
0.1862602113776709
0.34556072704304774
0.39676747423066994
0.538816734003357

而 C++ 程序打印:

Normalized, #0: 0.417022
Normalized, #1: 0.997185
Normalized, #2: 0.720324
Normalized, #3: 0.932557
Normalized, #4: 0.000114381
Normalized, #5: 0.128124
Normalized, #6: 0.302333
Normalized, #7: 0.999041
Normalized, #8: 0.146756
Normalized, #9: 0.236089

我可以轻松跳过 C++ 版本中的所有其他值,它应该给我与 Python 版本匹配的数字(在公差范围内)。 但是为什么 Python 的实现似乎会跳过所有其他值,或者 C++ 版本中的这些额外值来自哪里?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM