NLTK 中的 PCFG 生成

Question

我正在嘗試從包含解析樹的文件中學習 PCFG，例如：

(S (DECL_MD (NP_PPSS (PRON_PPSS (ii))) (VERB_MD (pt_verb_md need)) (NP_NN (ADJ_AT (aa)) (NOUN_NN (flight flight)) (PREP_IN (pt_prep_in from))) (AVPNP_NP (NOUN_NP (charlotte charlotte) ))

這是我的相關代碼：

def loadData(path):
    with open(path ,'r') as f:
        data = f.read().split('\n')
    return data

def getTreeData(data):
    return map(lambda s: tree.Tree.fromstring(s), data)

# Main script
print("loading data..")
data = loadData('C:\\Users\\Rayyan\\Desktop\\MSc Data\\NLP\\parseTrees.txt')
print("generating trees..")
treeData = getTreeData(data)
print("done!")
print("done!")

現在之后我在互聯網上嘗試了很多東西，例如：

grammar = induce_pcfg(S, productions)

但這里的產品總是內置的功能，例如：

productions = []
for item in treebank.items[:2]:
  for tree in treebank.parsed_sents(item):
    productions += tree.productions()

在我的例子中，我嘗試用treeData替換這里的production ，但它不起作用。 我錯過了什么或做錯了什么？

Answer 1

從建樹開始：

from nltk import tree
treeData_rules = []

# Extract the CFG rules (productions) for the sentence
for item in treeData:
    for production in item.productions():
    treeData_rules.append(production)
treeData_rules

然后你可以像這樣提取概率CFG（PCFG）：

from nltk import induce_pcfg

S = Nonterminal('S')
grammar_PCFG = induce_pcfg(S, treeData_rules)
print(grammar_PCFG)

NLTK 中的 PCFG 生成

問題描述

1 個解決方案

解決方案1
5 已采納 2018-03-16 16:56:33

NLTK 中的 PCFG 生成

問題描述

1 個解決方案

解決方案1 5 已采納 2018-03-16 16:56:33

解決方案1
5 已采納 2018-03-16 16:56:33