NLTK : From string to Tree with “slash-tokens” word/POS?

Question

The tree pretty print of the nltk.Tree class prints in the following format :

print spacy2tree(nlp(u'Williams is a defensive coach') )
(S
  (SUBJ Williams/NNP)
  (PRED is/VBZ test/VBN)
  a/DT
  defensive/JJ
  coach/NN)

as Tree :

 spacy2tree(nlp(u'Williams is a defensive coach') )
 Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]), 
    Tree('PRED', [(u'is', u'VBZ'), ('test', 'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])

but dosen't ingest it correctly :

tfs =  spacy2tree(nlp(u'Williams is a defensive coach') ).pformat()

Tree.fromstring(tfs)
Tree('S', [Tree('SUBJ', ['Williams/NNP']), 
   Tree('PRED', ['is/VBZ', 'test/VBN']), 'a/DT', 'defensive/JJ', 'coach/NN'])

example :

      correct                                    incorrect
 ('SUBJ', [(u'Williams', u'NNP')])       =vs=>    ('SUBJ', ['Williams/NNP'])
('PRED', [(u'is', u'VBZ'), ('test', 'VBN')])  =vs=> ('PRED', ['is/VBZ', 'test/VBN'])

is there a utility to ingest Tree from string correctly ??

Answer 1

Seems that I figured it out :

 : Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/')))
 : Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]), 
         Tree('PRED', [(u'is', u'VBZ'), (u'test', u'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])

So now this works correctly too :

: tree2conlltags(Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/'))))
 : 
 [(u'Williams', u'NNP', u'B-SUBJ'),
  (u'is', u'VBZ', u'B-PRED'),
  (u'test', u'VBN', u'I-PRED'),
  (u'a', u'DT', u'O'),
  (u'defensive', u'JJ', u'O'),
  (u'coach', u'NN', u'O')]

NLTK : From string to Tree with “slash-tokens” word/POS?

Question

1 answers

solution1
0 2019-08-10 17:57:46

NLTK : From string to Tree with “slash-tokens” word/POS?

Question

1 answers

solution1 0 2019-08-10 17:57:46

solution1
0 2019-08-10 17:57:46