简体   繁体   English

二元概率

[英]Bigram probability

I have a Moby Dick Corpus and I need to calculate the probability of the bigram "ivory leg."我有一个 Moby Dick Corpus,我需要计算二元“象牙腿”的概率。 I know that this command gives me the list of all bigrams我知道这个命令给了我所有二元组的列表

bigrams = [w1+" "+w2 for w1,w2 in zip(words[:-1], words[1:])]

But how do I get the probability of just the two words?但是我如何得到这两个词的概率呢?

You can count all the bigrams and count the specific bigram you are looking for.您可以计算所有二元组并计算您要查找的特定二元组。 The probability of the bigram occurring P(bigram) is jut the quotient of those. bigram 出现的概率 P(bigram) 只是这些的商。 The conditional probability of word[1] give word[0] P(w[1] | w[0]) is the quotient of the number of occurrence of the bigram over the count of w[0]. word[1] 给 word[0] 的条件概率 P(w[1] | w[0]) 是二元组出现次数与 w[0] 计数的商。 For example looking at the bigram ('some', 'text') :例如查看二元组('some', 'text')

s = 'this is some text about some text but not some other stuff'.split()

bigrams = [(s1, s2) for s1, s2 in zip(s, s[1:])]

# [('this', 'is'),
#  ('is', 'some'),
# ('some', 'text'),
# ('text', 'about'),
# ...

number_of_bigrams = len(bigrams)
# 11

# how many times 'some' occurs 
some_count = s.count('some')
# 3

# how many times bigram occurs
bg_count = bigrams.count(('some', 'text'))
# 2

# probabily of 'text' given 'some' P(bigram | some)
# i.e. you found `some`, what's the probability that its' makes the bigram:
bg_count/some_count
# 0.666

# probabilty of bigram in text P(some text)
# i.e. pick a bigram at random, what's the probability it's your bigram:
bg_count/number_of_bigrams
# 0.181818

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM