简体   繁体   English

np.random.choice错误:概率不总和为1,但打印说它确实如此?

[英]np.random.choice error : probabilities doesn't sum to 1, but print says it does ?

I'm having a small problem when using numpy's random.choice function. 当使用numpy的random.choice函数时,我遇到了一个小问题。 I'm giving it a list (a), and the probabilities associated with this list (p) (I'm trying to generate random text, implementing a bigram markov model with probabilities calculated on a training corpus). 我给它一个列表(a),以及与该列表相关的概率(p)(我试图生成随机文本,实现具有在训练语料库上计算的概率的二元组马尔可夫模型)。 Problem is, it crashes mid-program, telling me that the probabilities don't sum to 1. Bigger problem is, sum(p) DOES sum to 1. 问题是,它在程序中间崩溃,告诉我概率不总和为1.更大的问题是,sum(p)总和为1。

Is this a bug ? 这是一个错误吗? Does random.choice sums differently than the regular sum function ? random.choice的总和是否与常规求和函数不同? Am I missing something ? 我错过了什么吗?

Here's the code: 这是代码:

def randomBigram(self):
    doc = open(self.path+"/randomGenBi.txt", "wb")
    lettre = str(np.random.choice(self.letters.index))
    a = [elem for elem in self.probaBigram.index if elem[1] == lettre]
    p = [self.probaBigram[elem] for elem in self.probaBigram.index if elem[1] == lettre]
    random = np.random.choice(a, p=p)
    i = 0
    while i < 5000:
        lettre = str(np.random.choice(self.letters.index))
        print "lettre", lettre
        a = [elem for elem in self.probaBigram.index if elem[1] == lettre]
        p = [self.probaBigram[elem] for elem in self.probaBigram.index if elem[1] == lettre]
        if sum(p) != 1.0:  #debug
            print "somme sur p:", sum(p)
            print "not equal"
        else:
            print "equals one"
        random = np.random.choice(a, p=p)
        doc.write(random)

        i += 1

And here's a sample of my shell output: 这是我的shell输出示例:

lettre a sum for p: 1.0 not equal 让p:1.0的总和不相等

I just don't really get it... 我真的不明白......

Any help is welcome :) 欢迎任何帮助:)

Thank you ! 谢谢 !

Jessica 杰西卡

Change: 更改:

print "somme sur p:", sum(p)

to

print "somme sur p:", repr(sum(p))
                      ^^^^^      ^

and try again. 然后再试一次。 print implicitly applies str() to items before printing them, and in "old enough" versions of Python str rounds floats to 12 significant digits. print在打印之前隐式地将str()应用于项目,并且在“足够老”的Python版本中, str会将浮点数浮动到12位有效数字。 There are many floats not equal to 1 that will print as 1.0 then. 有许多不等于1的浮点数将打印为1.0 But the only float whose repr displays as 1.0 is exactly 1.0. 但是repr显示为1.0的唯一浮点数恰好是1.0。

Once you discover that the sum really isn't 1.0, show us what it is and ask a new question about what to do next ;-) 一旦你发现总和真的不是1.0,告诉我们它是什么,然后问一个关于下一步该做什么的新问题;-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM