简体   繁体   English

获取二元概率(python)

[英]Getting the bigram probability (python)

I am trying to write a function that calculates the bigram probability.我正在尝试编写一个计算二元概率的函数。

So, I basically have to calculate the occurence of two consective words (ed I am) in a corpus and divide that by the first word of those two words.所以,我基本上必须计算语料库中两个连续词(ed I am)的出现,并将其除以这两个词的第一个词。

In formula it is:在公式中它是:

P(W_n-1, W_n) / P(W_n-1) P(W_n-1, W_n) / P(W_n-1)

So in my code I am trying to do something like:所以在我的代码中,我正在尝试执行以下操作:

def prob(self, prevWord, word):
    word = word.strip()
    prevWord = prevWord.strip()
    for sen in corpus:
        for word in sen:
            if(word occurs after prevWord): #Pseudocode here
                  counter++
    numerator = counter / self.total
    prevWordProb = self.counts[prevWord]/self.total
    return numerator / prevWordProb

First of all, is my approach valid?首先,我的方法有效吗? If so, I am not sure how to code the如果是这样,我不知道如何编码

if(word occurs after prevWord): #Pseudocode here

part of the code.代码的一部分。 How will it look like?它会是什么样子?

There are a few other issues with the code, but if resolved, the loop and conditional should look something like:代码还有一些其他问题,但如果解决了,循环和条件应该类似于:

for sen in corpus:
    for i, w in enumerate(sen):
        if w == prevWord and sen[i+1] == word:
            counter++

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM