I want to stem the words, for which i import the porterstemmer
pkg from nltk
but an error occurred at run time.
The error is :
TypeError: coercing to Unicode: need string or buffer, file found
My Python code is
import nltk;
from nltk.stem import PorterStemmer
stemmer=PorterStemmer()
file = open('C:/Python26/test.txt','r')
f=open("root.txt",'w')
with open(file,'r',-1) as rf:
lines = rf.readlines()
for word in lines:
root = stemmer.stem(word)
f.write(root+"\n")
f.close()
yes i tried it and got an error which i couldn't understand ad the error was 1.6.2 Traceback (most recent call last): File "C:\\Python26\\check.py", line 10, in with open(file,'r',-1) as rf: UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 6: ordinal not in range(128)
My code after ur recommended change is import nltk; import numpy; import numpy as np from StringIO import StringIO print numpy.__version__ from nltk.stem import PorterStemmer stemmer=PorterStemmer() file = np.genfromtxt('C:/Python26/test.txt', delimiter=" ") f=open("root.txt",'w') with open(file,'r',-1) as rf: lines = rf.readlines() for word in lines: root = stemmer.stem(word) f.write(root+"\\n") f.close() and my dummy file is like this
walking
talked
oranges
books
Src
Src
mAB
You have already opened the file. You're trying to pass a file object to with open...
. Remove file = open('C:/...
line.
PS You will be iterating over lines, not words.
You are opening file
in line 4 and then use that as the filename for another open()
in line 6. Just do:
import nltk;
from nltk.stem import PorterStemmer
stemmer=PorterStemmer()
with open("root.txt",'w') as f:
with open('C:/Python26/test.txt','r',-1) as rf:
lines = rf.readlines()
for word in lines:
root = stemmer.stem(word)
f.write(root+"\n")
It seems that the problem is with the parameters passed to a function, and i'm guessing its in the line root = stemmer.stem(word)
try using the function genfromtxt instead of open():
>>> import numpy as np
>>> from StringIO import StringIO
>>> np.genfromtxt('C:/Python26/test.txt', delimiter=",") #Whatever delimiter your file has.
That should fix the problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.