如果給定二元組的概率為0，如何找到二元組的困惑度

Question

給定計算二元組困惑度的公式（以及加 1 平滑的概率），

可能性

當句子中單詞 per 的預測概率之一為 0 時，如何進行？

# just examples, don't mind the counts
corpus_bigram = {'<s> now': 2, 'now is': 1, 'is as': 6, 'as one': 1, 'one mordant': 1, 'mordant </s>': 5}
word_dict = {'<s>': 2, 'now': 1, 'is': 6, 'as': 1, 'one': 1, 'mordant': 5, '</s>': 5}

test_bigram = {'<s> now': 2, 'now <UNK>': 1, '<UNK> as': 6, 'as </s>': 5}

n = 1 # Add one smoothing
probabilities = {}
for bigram in test_bigram:
    if bigram in corpus_bigram:
        value = corpus_bigram[bigram]
        first_word = bigram.split()[0]
        probabilities[bigram] = (value + n) / (word_dict.get(first_word) + (n * len(word_dict)))
    else:
        probabilities[bigram] = 0

例如，如果test_bigram的概率為

# Again just dummy probability values
probabilities = {{'<s> now': 0.35332322, 'now <UNK>': 0, '<UNK> as': 0, 'as </s>': 0.632782318}}

perplexity = 1
for key in probabilities:
    # when probabilities[key] == 0 ????
    perplexity = perplexity * (1 / probabilities[key])

N = len(sentence)
perplexity = pow(perplexity, 1 / N)

ZeroDivisionError：除以零

Answer 1

常見的解決方案是分配不會出現小概率的單詞，例如1/N ，其中N是單詞的總數。 因此，您假裝數據中未出現的單詞確實出現過一次； 這只會引入一個小錯誤，但會停止除以零。

所以在你的情況下， probabilities[bigram] = 1 / <sum of all bigram frequencies>

如果給定二元組的概率為0，如何找到二元組的困惑度

問題描述

1 個解決方案

解決方案1
0 2021-03-31 16:31:36

如果給定二元組的概率為0，如何找到二元組的困惑度

問題描述

1 個解決方案

解決方案1 0 2021-03-31 16:31:36

解決方案1
0 2021-03-31 16:31:36