NLTK ViterbiParser無法解析不在PCFG規則中的單詞

Question

import nltk
from nltk.parse import ViterbiParser

def pcfg_chartparser(grammarfile):
    f=open(grammarfile)
    grammar=f.read()
    f.close()
    return nltk.PCFG.fromstring(grammar)

grammarp = pcfg_chartparser("wsjp.cfg")

VP = ViterbiParser(grammarp)
print VP
for w in sent:
    for tree in VP.parse(nltk.word_tokenize(w)):
        print tree

當我運行上面的代碼時，它為句子產生以下輸出，“關燈” -

（S（VP（VB轉）（PRT（RP關））（NP（DT）（NNS燈））））（p = 2.53851e-14）

但是，它會引起句子的以下錯誤，“請關掉燈” -

ValueError：語法不包含一些輸入詞：u“'please'”

我正在通過提供概率上下文無關語法來構建ViterbiParser。 它適用於解析具有已經在語法規則中的單詞的句子。 它無法解析Parser在語法規則中沒有看到單詞的句子。 如何解決這個限制？
我指的是這個任務。

Answer 1

首先，嘗試使用（i）名稱空間和（ii）明確的變量名稱，例如：

>>> from nltk import PCFG
>>> from nltk.parse import ViterbiParser
>>> import urllib.request
>>> response = urllib.request.urlopen('https://raw.githubusercontent.com/salmanahmad/6.863/master/Labs/Assignment5/Code/wsjp.cfg')
>>> wsjp = response.read().decode('utf8')
>>> grammar = PCFG.fromstring(wsjp)
>>> parser = ViterbiParser(grammar)
>>> list(parser.parse('turn off the lights'.split()))
[ProbabilisticTree('S', [ProbabilisticTree('VP', [ProbabilisticTree('VB', ['turn']) (p=0.002082678), ProbabilisticTree('PRT', [ProbabilisticTree('RP', ['off']) (p=0.1089101771)]) (p=0.10768769667270556), ProbabilisticTree('NP', [ProbabilisticTree('DT', ['the']) (p=0.7396712852), ProbabilisticTree('NNS', ['lights']) (p=4.61672e-05)]) (p=4.4236397464693323e-07)]) (p=1.0999324002161311e-13)]) (p=2.5385077255727538e-14)]

如果我們看一下語法：

>>> grammar.check_coverage('please turn off the lights'.split())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/nltk/grammar.py", line 631, in check_coverage
    "input words: %r." % missing)
ValueError: Grammar does not cover some of the input words: "'please'".

要解決未知單詞問題，有幾種選擇 ：

使用wildcard非終端節點替換未知單詞 。 找到一些方法用wildcard check_coverage()語法沒有覆蓋的check_coverage() ，然后使用wildcard解析句子
- 這通常會降低解析器的准確性，除非您專門訓練PCFG使用處理未知單詞的語法，並且通配符是未知單詞的超集。
在使用learn_pcfg.py創建學習PCFG之前，回到您的語法生成文件，並在終端制作中添加所有可能的單詞 。
將未知單詞添加到您的pcfg語法中，然后重新歸一化權重 ，給予未知單詞非常小的權重（您還可以嘗試更智能的平滑/插值技術）

由於這是一個家庭作業問題，我不會用完整的代碼給出答案。 但上述提示應足以解決問題。

NLTK ViterbiParser無法解析不在PCFG規則中的單詞

問題描述

1 個解決方案

解決方案1
6 2016-01-30 21:20:14

NLTK ViterbiParser無法解析不在PCFG規則中的單詞

問題描述

1 個解決方案

解決方案1 6 2016-01-30 21:20:14

解決方案1
6 2016-01-30 21:20:14