First of all thanks for the help, I'm trying to fix this problem for days.
File myStopWords.txt:
è
ad
più
a
b
c
17
My code:
stopWord = set(open("<...>/myStopwords.txt").read().split("\n"))
oldWords = set(["a","b","ad", "è", "più","17","horse"])
print( oldWords.difference(stopWord) )
Results:
{'horse', 'ad', 'più', 'è'}
Why aren't "ad"
, "è"
, "più"
subtracted from the set
?
The result should be {horse}
.
Thank you. As suggested in previous comments, this is the solution:
1) Convert the text file in UTF-8.
2)
fname = '<...>/myStopwords.txt'
with open(fname, encoding='utf-8') as f:
content = f.readlines()
stopWord = [x.strip() for x in content]
oldWords = set(["a","b","ad", "è", "più","17","horse"])
print( oldWords.difference(stopWord) )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.