简体   繁体   中英

NLTK : From string to Tree with “slash-tokens” word/POS?

The tree pretty print of the nltk.Tree class prints in the following format :

print spacy2tree(nlp(u'Williams is a defensive coach') )
(S
  (SUBJ Williams/NNP)
  (PRED is/VBZ test/VBN)
  a/DT
  defensive/JJ
  coach/NN)

as Tree :

 spacy2tree(nlp(u'Williams is a defensive coach') )
 Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]), 
    Tree('PRED', [(u'is', u'VBZ'), ('test', 'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])

but dosen't ingest it correctly :

tfs =  spacy2tree(nlp(u'Williams is a defensive coach') ).pformat()

Tree.fromstring(tfs)
Tree('S', [Tree('SUBJ', ['Williams/NNP']), 
   Tree('PRED', ['is/VBZ', 'test/VBN']), 'a/DT', 'defensive/JJ', 'coach/NN'])

example :

      correct                                    incorrect
 ('SUBJ', [(u'Williams', u'NNP')])       =vs=>    ('SUBJ', ['Williams/NNP'])
('PRED', [(u'is', u'VBZ'), ('test', 'VBN')])  =vs=> ('PRED', ['is/VBZ', 'test/VBN'])

is there a utility to ingest Tree from string correctly ??

Seems that I figured it out :

 : Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/')))
 : Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]), 
         Tree('PRED', [(u'is', u'VBZ'), (u'test', u'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])

So now this works correctly too :

: tree2conlltags(Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/'))))
 : 
 [(u'Williams', u'NNP', u'B-SUBJ'),
  (u'is', u'VBZ', u'B-PRED'),
  (u'test', u'VBN', u'I-PRED'),
  (u'a', u'DT', u'O'),
  (u'defensive', u'JJ', u'O'),
  (u'coach', u'NN', u'O')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM