So, I was trying to use NLTK from Python to do a part of speech tagging to a text file. This is the code I used
import nltk
from nltk import word_tokenize, pos_tag
f = open('all.txt')
raw = f.read()
text = word_tokenize(raw)
paosted = nltk.pos_tag(text)
saveFile = open('ol.txt', 'w')
saveFile.write(str(paosted))
saveFile.close()
The code did work, but the problem is that it saved all the text in one single line as shown in the attached picture. as shown here .. I know I should be using a "\\n" function, but I am a novice in python and have no idea how to do it, so any help would be appreciated :) ..
-------- UPDATE -----------
WELL, People have been really helpful and offered some solutions ie, this code:
import nltk
from nltk import word_tokenize, pos_tag
f = open('all.txt')
raw = f.read()
text = word_tokenize(raw)
paosted = nltk.pos_tag(text)
saveFile.write(str(paosted).replace('),' , '),\n'))
saveFile.close()
But I still need to have it in the form of a paragraph because I am going to use it latter in a concordance software. Please have a look at this screenshot: https://i.stack.imgur.com/tU1NW.png
paosted
is a list of tuple you can iterate over it and write each tuple to a line
Ex:
paosted = nltk.pos_tag(text)
saveFile = open('ol.txt', 'w')
for line in paosted:
saveFile.write(str(line)+ "\n")
saveFile.close()
Updating my answer accordingly to,
temp = []
for i in paosted:
temp.append("_".join(i))
" ".join(temp)
Thank you all! I followed some of your instructions and the best result I got was with this code:
import nltk
from nltk import word_tokenize, pos_tag
f = open('all.txt')
raw = f.read()
text = word_tokenize(raw)
paosted = nltk.pos_tag(text)
saveFile = open('output.txt', 'w')
saveFile.write(str(paosted).replace("('.', '.')" , "\n"))
saveFile.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.