简体   繁体   English

以最大概率从二元模型中获取下一个词

[英]get next word from bigram model on max probability

I want to generate sonnets using nltk with bigrams.我想使用带有二元组的 nltk 生成十四行诗。 I have generated bigrams and computed probability of each bigram and stored in default dict like that.我已经生成了二元组并计算了每个二元组的概率,并像这样存储在默认字典中。

[('"Let', defaultdict(<function <lambda>.<locals>.<lambda> at0x1a17f98bf8>, 
{'the': 0.2857142857142857, 'dainty': 
0.14285714285714285, 'it': 0.14285714285714285, 'those': 
0.14285714285714285, 'me': 0.14285714285714285, 'us': 
0.14285714285714285}))]

Probability of each word appearing after let is given.给出了每个单词出现在 let 之后的概率。 Like that I have bigram model for my corpus.就像那样,我的语料库有二元模型。 Now I want to generate 4 lines sonnet with 15 words in each line.现在我想生成 4 行十四行诗,每行 15 个单词。 I have tried this code but it is not working.我已经尝试过这段代码,但它不起作用。

def generate_sonnet(word):
lines = 4
words= 15
for i in range(lines):
    line = ()
    for j in range(words):
   #I am selecting max probability but not that word. How I can select that word which has max probability of occurring with word?
        nword = float(max(model[word].values()))
        word += nword
        
word = random.choice(poetrylist)
generate_sonnet(word)

I select a random word and pass it to my function.我选择一个随机单词并将其传递给我的函数。 where I want to join 15 words using bigrams and when 1 line completes the next 3 should be done.我想使用二元组加入 15 个单词,当 1 行完成时,接下来的 3 行应该完成。

here is a simple code snippet to show how this task can be achieved (with a very naive approach)这是一个简单的代码片段,用于展示如何实现此任务(使用非常幼稚的方法)

bigram1 = {'Let' : {'the': 0.2857142857142857, 'dainty':
0.14285714285714285, 'it': 0.14285714285714285, 'those':
0.14285714285714285, 'me': 0.14285714285714285, 'us':
0.14285714285714285}}

bigram2 = {'the' : {'dogs' : 0.4, 'it' : 0.2, 'a' : 0.2, 'b': 0.2}}
bigram3 = {'dogs' : {'out' : 0.6, 'it' : 0.2, 'jj' : 0.2}}

model = {}
model.update(bigram1)
model.update(bigram2)
model.update(bigram3)

sentence = []

iterations = 3
word = 'Let'
sentence.append(word)

for _ in range(iterations):
    max_value = 0
    for k, v in model[word].iteritems():
        if v >= max_value:
            word = k
            max_value = v
    sentence.append(word)


print(" ".join(sentence)) 

output输出

Let the dogs out

the code is written in a very simple way and this is toy example for understanding proposes代码以非常简单的方式编写,这是理解建议的玩具示例

keep in mind, the word taken in the first word encountered with a max value thus this model is deterministic, consider adding random approach of choosing from a set of words which share the same max value请记住,在遇到最大值的第一个单词中采用的单词因此该模型是确定性的,请考虑添加从共享相同最大值的一组单词中进行选择的随机方法

I suggest to sample the words in proportion to their probabilities like so我建议像这样按概率对单词进行采样

dist = {'the': 0.2857142857142857, 'dainty':
0.14285714285714285, 'it': 0.14285714285714285, 'those':
0.14285714285714285, 'me': 0.14285714285714285, 'us':
0.14285714285714285}

words = dist.keys()
probabilities = dist.values()
numpy.random.choice(words, p=probabilities)

this will give you "random" word every time according to the distribution given这将根据给定的分布每次给你“随机”字

smt like so ( draft ) smt 像这样(草稿

for _ in range(iterations):
    word = np.random.choice(model[word].keys(), p=model[word].values())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM