[英]Difference between set of strings doesn't work
First of all thanks for the help, I'm trying to fix this problem for days. 首先,感谢您的帮助,我几天来一直试图解决此问题。
File myStopWords.txt: 档案myStopWords.txt:
è
ad
più
a
b
c
17
My code: 我的代码:
stopWord = set(open("<...>/myStopwords.txt").read().split("\n"))
oldWords = set(["a","b","ad", "è", "più","17","horse"])
print( oldWords.difference(stopWord) )
Results: 结果:
{'horse', 'ad', 'più', 'è'}
Why aren't "ad"
, "è"
, "più"
subtracted from the set
? 为什么不从set
减去"ad"
, "è"
, "più"
?
The result should be {horse}
. 结果应为{horse}
。
Thank you. 谢谢。 As suggested in previous comments, this is the solution: 如先前评论中所建议,这是解决方案:
1) Convert the text file in UTF-8. 1)将文本文件转换为UTF-8。
2) 2)
fname = '<...>/myStopwords.txt'
with open(fname, encoding='utf-8') as f:
content = f.readlines()
stopWord = [x.strip() for x in content]
oldWords = set(["a","b","ad", "è", "più","17","horse"])
print( oldWords.difference(stopWord) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.