Python在nltk.tree中定位單詞

Question

我試圖建立一個nltk來獲取單詞的上下文。 我有兩個句子

sentences=pd.DataFrame({"sentence": ["The weather was good so I went swimming", "Because of the good food we took desert"]})

我想找出“好”這個詞是什么意思。 我的想法是對句子進行分塊（來自此處的教程代碼），然后查看單詞“ good”和一個名詞是否在同一節點中。 如果不是，則表示該名詞之前或之后的名詞。

首先，按照本教程中的說明構建塊

from nltk.corpus import conll2000
test_sents = conll2000.chunked_sents('test.txt', chunk_types=['NP'])
train_sents = conll2000.chunked_sents('train.txt', chunk_types=['NP'])

class ChunkParser(nltk.ChunkParserI):
    def __init__(self, train_sents):
        train_data = [[(t,c) for w,t,c in nltk.chunk.tree2conlltags(sent)]
            for sent in train_sents]
        self.tagger = nltk.TrigramTagger(train_data)
    def parse(self, sentence):
        pos_tags = [pos for (word,pos) in sentence]
        tagged_pos_tags = self.tagger.tag(pos_tags)
        chunktags = [chunktag for (pos, chunktag) in tagged_pos_tags]
        conlltags = [(word, pos, chunktag) for ((word,pos),chunktag)
        in zip(sentence, chunktags)]
        return nltk.chunk.conlltags2tree(conlltags)

NPChunker = ChunkParser(train_sents)

然后，將其應用到我的句子中：

sentence=sentences["sentence"][0]
tags=nltk.pos_tag(sentence.lower().split())
result = NPChunker.parse(tags)
print result

結果看起來像這樣

(S
  (NP the/DT weather/NN)
  was/VBD
  (NP good/JJ)
  so/RB
  (NP i/JJ)
  went/VBD
  swimming/VBG)

現在，我想“查找”單詞“ good”在哪個節點上。 我還沒有真正找到更好的方法，只是計算節點和葉子中的單詞。 單詞“ good”是句子中的單詞3。

stuctured_sentence=[]
for n in range(len(result)):
    stuctured_sentence.append(list(result[n]))

structure_length=[]
for n in result:
    if isinstance(n, nltk.tree.Tree):               
        if n.label() == 'NP':
            print n
            structure_length.append(len(n))
    else:
        print str(n) +"is a leaf"
        structure_length.append(1)

通過總結單詞的數量，我知道單詞“ good”在哪里。

structure_frame=pd.DataFrame({"structure": stuctured_sentence, "length": structure_length})
structure_frame["cumsum"]=structure_frame["length"].cumsum()

有沒有更簡單的方法來確定單詞的節點或葉，並找出“好”一詞指的是什么？

最佳亞歷克斯

Answer 1

在葉子列表中找到單詞最容易。 然后，您可以將葉子索引轉換為樹索引，這是樹下的路徑。 要查看將good東西分組，請上一層並檢查從中挑選出的子樹。

首先，找出平淡句子中good位置。 （如果您仍將未標記的句子作為標記列表，則可以跳過此步驟。）

words = [ w for w, t in result.leaves() ]

現在我們找到good的線性位置，並轉換為樹路徑：

>>> position = words.index("good")
>>> treeposition = result.leaf_treeposition(position)
>>> print(treeposition)
(2, 0)

“樹位置”是沿着樹的路徑，表示為元組。 （NLTK樹可以用元組和整數建立索引。）要查看good的姐妹們，請在到達路徑末端之前停止一步。

>>> print(result[ treeposition[:-1] ])
Tree('NP', [('good', 'JJ')])

你在這。 一棵只有一片葉子的子樹，一對(good, JJ) 。

Python在nltk.tree中定位單詞

問題描述

1 個解決方案

解決方案1
5 已采納 2016-06-06 22:12:13

Python在nltk.tree中定位單詞

問題描述

1 個解決方案

解決方案1 5 已采納 2016-06-06 22:12:13

解決方案1
5 已采納 2016-06-06 22:12:13