简体   繁体   中英

All text is saved in one line

So, I was trying to use NLTK from Python to do a part of speech tagging to a text file. This is the code I used

import nltk
from nltk import word_tokenize, pos_tag
f = open('all.txt')
raw = f.read()
text = word_tokenize(raw)
paosted = nltk.pos_tag(text)
saveFile = open('ol.txt', 'w')
saveFile.write(str(paosted))
saveFile.close()

The code did work, but the problem is that it saved all the text in one single line as shown in the attached picture. as shown here .. I know I should be using a "\\n" function, but I am a novice in python and have no idea how to do it, so any help would be appreciated :) ..

                      -------- UPDATE -----------

WELL, People have been really helpful and offered some solutions ie, this code:

import nltk
from nltk import word_tokenize, pos_tag
f = open('all.txt')
raw = f.read()
text = word_tokenize(raw)
paosted = nltk.pos_tag(text)
saveFile.write(str(paosted).replace('),' ,  '),\n'))
saveFile.close()

But I still need to have it in the form of a paragraph because I am going to use it latter in a concordance software. Please have a look at this screenshot: https://i.stack.imgur.com/tU1NW.png

paosted is a list of tuple you can iterate over it and write each tuple to a line

Ex:

paosted = nltk.pos_tag(text)
saveFile = open('ol.txt', 'w')
for line in paosted:
    saveFile.write(str(line)+ "\n")
saveFile.close()

Updating my answer accordingly to,

temp = []
for i in paosted:
    temp.append("_".join(i))

" ".join(temp)

Thank you all! I followed some of your instructions and the best result I got was with this code:

import nltk
from nltk import word_tokenize, pos_tag
f = open('all.txt')
raw = f.read()
text = word_tokenize(raw)
paosted = nltk.pos_tag(text)
saveFile = open('output.txt', 'w')
saveFile.write(str(paosted).replace("('.', '.')" ,  "\n"))
saveFile.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM