如何使用Python 3在两个csv文件中找到相同的单词

Question

I'm totally new in python but I'm working on a small project. 我是python的新手，但我正在做一个小项目。 I have a A file and a B file like below: 我有一个A文件和一个B文件，如下所示：

And I want to compare A&B and get the words that in both A&B files. 我想比较A＆B，并在两个A＆B文件中得到相同的词。 I've tried several methods but I couldn't solve it anyway. 我尝试了几种方法，但无论如何还是无法解决。

Can anyone help me with it? 有人可以帮我吗？ Thanks! 谢谢！

Answer 1

You could just create 2 lists and compare them. 您可以只创建2个列表并进行比较。

list1 = []
list2 = []

with open('file1', 'r+') as myfile1:
   for line in myfile1:
      list1.append(line)

with open('file2', 'r+') as myfile2:
   for line in myfile2:
      list2.append(line)

compare = set(list1) & set(list2)

Answer 2

Rthomas529 has the right idea, but it gets into a few pitfalls. Rthomas529有一个正确的想法，但是有一些陷阱。 It misses cases where there is punctuation, inconsistent capitalization, or lines with multiple words. 它会漏掉标点符号，大小写不一致或带有多个单词的行的情况。

# Load the files for processing
file_1 = open('f1.txt')
file_2 = open('f2.txt')

# Prep some empty sets to throw words into
words_1 = set()
words_2 = set()

for word in file_1.read().split():
    cleaned_word = ''.join([
        i for i in list(word.lower()) 
        if i.isalpha() or i == "'"
    ])
    if cleaned_word != '': # Just in case!
        words_1.add(cleaned_word)

for word in file_2.read().split():
    cleaned_word = ''.join([
        i for i in list(word.lower()) 
        if i.isalpha() or i == "'"
    ])
    if cleaned_word != '': # Just in case!
        words_2.add(cleaned_word)

similar_words = words_1 & words_2

如何使用Python 3在两个csv文件中找到相同的单词

问题描述

2 个解决方案

解决方案1
0

解决方案2
0 已采纳 2017-08-18 20:13:15

如何使用Python 3在两个csv文件中找到相同的单词

问题描述

2 个解决方案

解决方案1 0

解决方案2 0 已采纳 2017-08-18 20:13:15

解决方案1
0

解决方案2
0 已采纳 2017-08-18 20:13:15