[英]how to find the same words in two csv files using Python 3
I'm totally new in python but I'm working on a small project. 我是python的新手,但我正在做一个小项目。 I have a A file and a B file like below: 我有一个A文件和一个B文件,如下所示:
And I want to compare A&B and get the words that in both A&B files. 我想比较A&B,并在两个A&B文件中得到相同的词。 I've tried several methods but I couldn't solve it anyway. 我尝试了几种方法,但无论如何还是无法解决。
Can anyone help me with it? 有人可以帮我吗? Thanks! 谢谢!
You could just create 2 lists and compare them. 您可以只创建2个列表并进行比较。
list1 = []
list2 = []
with open('file1', 'r+') as myfile1:
for line in myfile1:
list1.append(line)
with open('file2', 'r+') as myfile2:
for line in myfile2:
list2.append(line)
compare = set(list1) & set(list2)
Rthomas529 has the right idea, but it gets into a few pitfalls. Rthomas529有一个正确的想法,但是有一些陷阱。 It misses cases where there is punctuation, inconsistent capitalization, or lines with multiple words. 它会漏掉标点符号,大小写不一致或带有多个单词的行的情况。
# Load the files for processing
file_1 = open('f1.txt')
file_2 = open('f2.txt')
# Prep some empty sets to throw words into
words_1 = set()
words_2 = set()
for word in file_1.read().split():
cleaned_word = ''.join([
i for i in list(word.lower())
if i.isalpha() or i == "'"
])
if cleaned_word != '': # Just in case!
words_1.add(cleaned_word)
for word in file_2.read().split():
cleaned_word = ''.join([
i for i in list(word.lower())
if i.isalpha() or i == "'"
])
if cleaned_word != '': # Just in case!
words_2.add(cleaned_word)
similar_words = words_1 & words_2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.