简体   繁体   English

如何使用Python 3在两个csv文件中找到相同的单词

[英]how to find the same words in two csv files using Python 3

I'm totally new in python but I'm working on a small project. 我是python的新手,但我正在做一个小项目。 I have a A file and a B file like below: 我有一个A文件和一个B文件,如下所示: 在此处输入图片说明

And I want to compare A&B and get the words that in both A&B files. 我想比较A&B,并在两个A&B文件中得到相同的词。 I've tried several methods but I couldn't solve it anyway. 我尝试了几种方法,但无论如何还是无法解决。

Can anyone help me with it? 有人可以帮我吗? Thanks! 谢谢!

You could just create 2 lists and compare them. 您可以只创建2个列表并进行比较。

list1 = []
list2 = []

with open('file1', 'r+') as myfile1:
   for line in myfile1:
      list1.append(line)

with open('file2', 'r+') as myfile2:
   for line in myfile2:
      list2.append(line)

compare = set(list1) & set(list2)

Rthomas529 has the right idea, but it gets into a few pitfalls. Rthomas529有一个正确的想法,但是有一些陷阱。 It misses cases where there is punctuation, inconsistent capitalization, or lines with multiple words. 它会漏掉标点符号,大小写不一致或带有多个单词的行的情况。

# Load the files for processing
file_1 = open('f1.txt')
file_2 = open('f2.txt')

# Prep some empty sets to throw words into
words_1 = set()
words_2 = set()

for word in file_1.read().split():
    cleaned_word = ''.join([
        i for i in list(word.lower()) 
        if i.isalpha() or i == "'"
    ])
    if cleaned_word != '': # Just in case!
        words_1.add(cleaned_word)

for word in file_2.read().split():
    cleaned_word = ''.join([
        i for i in list(word.lower()) 
        if i.isalpha() or i == "'"
    ])
    if cleaned_word != '': # Just in case!
        words_2.add(cleaned_word)

similar_words = words_1 & words_2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM