如何將 spacy doc 轉換為嵌套的令牌列表

Question

我使用 spacy 和 stanfordnlp 進行依賴解析，我得到了一個 spacy 文檔。 我怎么能把那個文檔變成嵌套列表，其中每個子列表都包含頭的子標記

Answer 1

以下是您所問問題的一般解決方案，盡管包括輸入，預期 output，示例代碼將有助於確保此答案是相關的。 評論中提供了解釋。

import spacy

# Load relevant language/pipeline: here, the built-in small English web-based
# model.
nlp = spacy.load("en_core_web_sm")

# Run text through pipeline to create annotated doc.
sample_text = "Colorless green ideas sleep furiously."
doc = nlp(sample_text)

# Iterate through each token (t) in the doc object, and create a nested list
# of the children of each token. Keep in mind that like many spaCy attributes,
# token.children returns a generator. To access all of its elements at once,
# you will have to convert this generator into an object of type list.
child_list = [list(t.children) for t in doc]

# Now as an exercise, print out each token and check to see if you get the
# children you expected. Normally you would want to iterate on the objects 
# themselves -- we only use range() here for purposes of illustration.
for i in range(len(doc)):
    print("  token {}: {}".format(i + 1, doc[i]))
    print("    children: {}\n".format(child_list[i]))

根據問題的要求，output 是子令牌列表的列表。 請注意，雖然您的終端會像文本一樣顯示每個標記，但這些標記不僅僅是文本； 它們是 spaCy token對象，每個都根據doc中的注釋加載了語言信息。 output 將如下所示。

$ python example.py
  token 1: Colorless
    children: []
  token 2: green
    children: []
  token 3: ideas
    children: [Colorless, green]
  token 4: sleep
    children: [ideas, furiously, .]
  token 5: furiously
    children: []
  token 6: .
    children: []

這正是我們所期望的：

Answer 2

這是示例：

class Sent2Struct(object):

    def root(self,doc):
        for word in doc :
            if word.dep_ == 'ROOT' : return word

    def lol(self,root) :
        if len(list(root.children)) == 0 : return root.text
        childs = [ self.lol(child) for child in root.children ]
        return [root.text] + childs 



   In [100]: print( ss.lol(ss.root(nlp('the box is on the table'))) )                                                                                                           
   ['is', ['box', 'the'], ['on', ['table', 'the']]]

IE

   is(box(the), on(table(the)) )

如何將 spacy doc 轉換為嵌套的令牌列表

問題描述

2 個解決方案

解決方案1
1 已采納 2019-10-24 14:58:50

解決方案2
0 2021-01-05 03:30:07

如何將 spacy doc 轉換為嵌套的令牌列表

問題描述

2 個解決方案

解決方案1 1 已采納 2019-10-24 14:58:50

解決方案2 0 2021-01-05 03:30:07

解決方案1
1 已采納 2019-10-24 14:58:50

解決方案2
0 2021-01-05 03:30:07