简体   繁体   中英

Compare two CSV files and look for matches Python

I have two CSV files that are like

CSV1

H1,H2,H3
arm,biopsy,forearm
heart,leg biopsy,biopsy

organs.csv

arm
leg
forearm
heart
skin

I need to compare both the files and get an output list like this [arm,forearm,heart,leg] but the script that I'm currently working on doesn't give me any output (I want leg also in the output, though it is mixed with biopsy in the same cell). Here's the code so far. How can I get all the matched words?

import csv
import io

alist, blist = [], []

with open("csv1.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        alist.append(row)
with open("organs.csv", "rb") as fileB:
    reader = csv.reader(fileB, delimiter=',')
    for row in reader:
        blist.append(row)

first_set = set(map(tuple, alist))
secnd_set = set(map(tuple, blist))

matches = set(first_set).intersection(secnd_set)
print matches

Try this:

import csv

alist, blist = [], []

with open("csv1.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        for row_str in row:
            alist += row_str.strip().split()

with open("organs.csv", "rb") as fileB:
    reader = csv.reader(fileB, delimiter=',')
    for row in reader:
        blist += row

first_set = set(alist)
second_set = set(blist)

print first_set.intersection(second_set)

Basically, iterating through the csv file via csv reader returns a row which is a list of the items (strings) like this ['arm', 'biopsy', 'forearm'], so you have to sum lists to insert all of the items.

On the other hand, to remove duplications only one set conversion via the set() function is required, and the intersection method returns another set with the elements.

Change the part reading from csv1.csv to:

with open("csv1.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        # append all words in cell
        for word in row:
            alist.append(word)

我会将CSV文件视为文本文件,在第一和第二秒中获得所有单词的列表,然后遍历第一列表以查看是否与第二列表完全匹配。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM