I successfully run the stanford english tagger, like below: input: "The picture is clear" output:
[[(u'This', u'DT'), (u'picture', u'NN'), (u'is', u'VBZ'), (u'clear', u'JJ')]]
But I want to read the whole file, and wish output is like this:
This_DT picture_NN is_VBZ clear_JJ
Like a sentence. Not a format in brackets. But I don't know how to change it in python.
My original code
import nltk
from nltk.tag.stanford import POSTagger
st = POSTagger('/Users/apple/Desktop/package/stanford-postagger/models/english-left3words-distsim.tagger', '/Users/apple/Desktop/package/stanford-postagger/stanford-postagger.jar')
print st.tag('This picture is clear'.split())
Fairly straightforward list/tuple/string manipulation:
inp = [[(u'This', u'DT'), (u'picture', u'NN'), (u'is', u'VBZ'), (u'clear', u'JJ')]]
out = []
for t in inp[0]:
out += t
outs = "_".join(out)
print outs
The data you have is a list of list of tuples. We are only interested in the first element - hence the inp[0]
.
We iterate through this list (I could have used a list comprehension) extracting the elements of the tuple ( t
), creating another list ( out
). It is then a simple task to join
the elements together with an underscore to produce a string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.