簡體   English   中英

將NLTK“干凈”樹轉換為NLTK Chunker結構

[英]Convert NLTK “clean” tree to NLTK Chunker structure

我是python的新手,並且正在努力解決數據類型概念及其轉換問題。

我有NLTK樹格式的句子(從斯坦福解析器獲得並轉換為NLTK樹)。 我需要應用為NLTK Chunker編寫的函數。 但是,NLTK樹格式與NLTK Chunker格式不同。 兩種格式都是NLTK樹,但元素結構似乎不同(見下文)。

你能幫我把NLTK樹轉換成NLTK Chunker輸出格式嗎?

提前致謝!

這是一個NLTK Chunker輸出:

(S
  (NP Pierre/NNP Vinken/NNP)
  ,/,
  (NP 61/CD years/NNS old/JJ)
  ,/,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

現在按元素和每個元素類型打印:

class 'nltk.tree.Tree' (NP Pierre/NNP Vinken/NNP)
type 'tuple' (',', ',')
class 'nltk.tree.Tree' (NP 61/CD years/NNS old/JJ)
type 'tuple' (',', ',')
type 'tuple' ('will', 'MD')
type 'tuple' ('join', 'VB')
class 'nltk.tree.Tree' (NP the/DT board/NN)
type 'tuple' ('as', 'IN')
class 'nltk.tree.Tree' (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
type 'tuple' ('.', '.')

這是一個NLTK“純”樹輸出(與NLTK doc完全相同):

(S
  (NP
    (NP (NNP Pierre) (NNP Vinken))
    (, ,)
    (ADJP (NP (CD 61) (NNS years)) (JJ old))
    (, ,))
  (VP
    (MD will)
    (VP
      (VB join)
      (NP (DT the) (NN board))
      (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director) (NNP Nov.) (CD 29)))
      ))
  (. .))

現在按元素和每個元素類型打印:

class 'nltk.tree.Tree' (NP
  (NP (NNP Pierre) (NNP Vinken))
  (, ,)
  (ADJP (NP (CD 61) (NNS years)) (JJ old))
  (, ,))
class 'nltk.tree.Tree' (NP (NNP Pierre) (NNP Vinken))
class 'nltk.tree.Tree' (NNP Pierre)
type 'str' Pierre
class 'nltk.tree.Tree' (NNP Vinken)
type 'str' Vinken
class 'nltk.tree.Tree' (, ,)
type 'str' ,
class 'nltk.tree.Tree' (ADJP (NP (CD 61) (NNS years)) (JJ old))
class 'nltk.tree.Tree' (NP (CD 61) (NNS years))
class 'nltk.tree.Tree' (CD 61)
type 'str' 61
class 'nltk.tree.Tree' (NNS years)
type 'str' years
class 'nltk.tree.Tree' (JJ old)
type 'str' old
class 'nltk.tree.Tree' (, ,)
type 'str' ,
class 'nltk.tree.Tree' (VP
  (MD will)
  (VP
    (VB join)
    (NP (DT the) (NN board))
    (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
    (NP (NNP Nov.) (CD 29))))
class 'nltk.tree.Tree' (MD will)
type 'str' will
class 'nltk.tree.Tree' (VP
  (VB join)
  (NP (DT the) (NN board))
  (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
  (NP (NNP Nov.) (CD 29)))
class 'nltk.tree.Tree' (VB join)
type 'str' join
class 'nltk.tree.Tree' (NP (DT the) (NN board))
class 'nltk.tree.Tree' (DT the)
type 'str' the
class 'nltk.tree.Tree' (NN board)
type 'str' board
class 'nltk.tree.Tree' (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
class 'nltk.tree.Tree' (IN as)
type 'str' as
class 'nltk.tree.Tree' (NP (DT a) (JJ nonexecutive) (NN director))
class 'nltk.tree.Tree' (DT a)
type 'str' a
class 'nltk.tree.Tree' (JJ nonexecutive)
type 'str' nonexecutive
class 'nltk.tree.Tree' (NN director)
type 'str' director
class 'nltk.tree.Tree' (NP (NNP Nov.) (CD 29))
class 'nltk.tree.Tree' (NNP Nov.)
type 'str' Nov.
class 'nltk.tree.Tree' (CD 29)
type 'str' 29
class 'nltk.tree.Tree' (. .)
type 'str' .

部分答案(即沒有代碼):

NLTK使用Tree類表示分塊數據, Tree類實際上是為任意語法樹設計的。 一個分塊的句子是一個只有一個分組級別的樹,所以要從完整的解析到一個分塊結構,你需要丟棄除了一種非遞歸組之外的所有組。 哪個組? 這取決於您的應用程序,因為存在不同類型的“塊”(例如,命名實體)。

你的例子顯示了NP塊,所以你可以走樹並省略除NP頂​​層之外的所有結構(如果你想將復雜的NP塊分解成小塊,則省略最低層)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM