简体   繁体   中英

np.random.choice error : probabilities doesn't sum to 1, but print says it does ?

I'm having a small problem when using numpy's random.choice function. I'm giving it a list (a), and the probabilities associated with this list (p) (I'm trying to generate random text, implementing a bigram markov model with probabilities calculated on a training corpus). Problem is, it crashes mid-program, telling me that the probabilities don't sum to 1. Bigger problem is, sum(p) DOES sum to 1.

Is this a bug ? Does random.choice sums differently than the regular sum function ? Am I missing something ?

Here's the code:

def randomBigram(self):
    doc = open(self.path+"/randomGenBi.txt", "wb")
    lettre = str(np.random.choice(self.letters.index))
    a = [elem for elem in self.probaBigram.index if elem[1] == lettre]
    p = [self.probaBigram[elem] for elem in self.probaBigram.index if elem[1] == lettre]
    random = np.random.choice(a, p=p)
    i = 0
    while i < 5000:
        lettre = str(np.random.choice(self.letters.index))
        print "lettre", lettre
        a = [elem for elem in self.probaBigram.index if elem[1] == lettre]
        p = [self.probaBigram[elem] for elem in self.probaBigram.index if elem[1] == lettre]
        if sum(p) != 1.0:  #debug
            print "somme sur p:", sum(p)
            print "not equal"
        else:
            print "equals one"
        random = np.random.choice(a, p=p)
        doc.write(random)

        i += 1

And here's a sample of my shell output:

lettre a sum for p: 1.0 not equal

I just don't really get it...

Any help is welcome :)

Thank you !

Jessica

Change:

print "somme sur p:", sum(p)

to

print "somme sur p:", repr(sum(p))
                      ^^^^^      ^

and try again. print implicitly applies str() to items before printing them, and in "old enough" versions of Python str rounds floats to 12 significant digits. There are many floats not equal to 1 that will print as 1.0 then. But the only float whose repr displays as 1.0 is exactly 1.0.

Once you discover that the sum really isn't 1.0, show us what it is and ask a new question about what to do next ;-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM