简体   繁体   English

根据两列中的特定数据比较两个CSV文件

[英]Comparing two CSV Files Based on Specific Data in two Columns

I was encouraged to step out of my comfort zone and use python with little to no experience and now I'm stuck. 我被鼓励走出我的舒适区,很少甚至没有经验地使用python,现在我陷入了困境。 I'm trying to compare two CSV files (fileA.csv and fileB.csv), and append any missing user rows to fileA.csv from fileB.csv. 我正在尝试比较两个CSV文件(fileA.csv和fileB.csv),并将所有丢失的用户行从fileB.csv附加到fileA.csv。 The only fields I can compare with are user's first and last names (in this case, it's row[0] and row[2] from each file). 我可以比较的唯一字段是用户的名字和姓氏(在这种情况下,它是每个文件的row [0]和row [2])。

From my understanding, you cannot append information to a file that you currently have open so I'm open to suggestions without having to create a third file (if possible). 据我了解,您无法将信息附加到当前已打开的文件中,因此我愿意接受建议而无需创建第三个文件(如果可能)。 Below has me on the right track, but there's a lot of data so I'll need a loop. 下面让我处于正确的轨道,但是有很多数据,所以我需要循环。 Please help. 请帮忙。

import csv
reader1 = csv.reader(open('fileA', 'rb'), delimiter=',', quotechar='|')
row1 = reader1.next()
reader2 = csv.reader(open('fileB', 'rb'), delimiter=',', quotechar='|')
row2 = reader2.next()


##For Loop...

        if (row1[0] == row2[0]) and (row1[2] == row2[2]):
                ## Compare next 
        else:
                ## Append entire row to fileA.csv

Example FileA.csv: 示例FileA.csv:

John,Thomas,Doe,some,other,stuff
Jane, ,Smith,some,other,stuff

Example FileB.csv: 示例FileB.csv:

John, ,Doe,other,personal,data
Jane,Elizabeth,Smith,other,personal,data
Robin,T,Williams,other,personal,data

The only row that should append from FileB to FileA is Robin's complete Row so that FileA looks like: 从FileB追加到FileA的唯一行是Robin的完整行,因此FileA如下所示:

DesiredResult_FileA: DesiredResult_FileA:

John,Thomas,Doe,some,other,stuff
Jane, ,Smith,some,other,stuff
Robin,T,Williams,other,personal,data

Store the information found in file A in memory first, in a set. 首先将在文件A中找到的信息存储在内存中。

Then, reopen file A in append mode, and loop over file B. Any name from B not found in the set, can then be added to file A: 然后,以追加模式重新打开文件A,然后循环遍历文件B。然后可以将B中任何未在集合中找到的名称添加到文件A中:

csv_dialect = dict(delimiter=',', quotechar='|')
names = set()
with open('fileA', 'rb') as file_a:
    reader1 = csv.reader(file_a, **csv_dialect)
    next(reader1)
    for row in reader1:
        names.add((row[0], row[2]))

# `names` is now a set of all names (taken from columns 0 and 2) found in file A.

with open('fileA', 'ab') as file_a, open('fileB', 'rb') as file_b:
    writer = csv.writer(file_a, **csv_dialect)
    reader2 = csv.reader(file_b, **csv_dialect)
    next(reader2)
    for row in reader2:
        if (row[0], row[2]) not in names:
            # This row was not present in file A, add it.
            writer.writerow(row)

The combined with line requires Python 2.7 or newer. with line结合with需要python 2.7或更高版本。 In earlier Python versions, simply nest the two statements: 在早期的Python版本中,只需嵌套两个语句:

with open('fileA', 'ab') as file_a:
    with open('fileB', 'rb') as file_b:
        # etc.

You can try pandas , that might help you handle csv files easier, and seems its more readable: 您可以尝试使用pandas ,这可能会帮助您更轻松地处理csv文件,并且看起来更具可读性:

import pandas as pd

df1 = pd.read_csv('FileA.csv', header=None)
df2 = pd.read_csv('FileB.csv', header=None)


for i in df2.index:
    # Don't append if that row is existed in FileA
    if i in df1.index:
        if df1.ix[i][0] == df2.ix[i][0] and df1.ix[i][2] == df2.ix[i][2]: continue

    df1 = df1.append(df2.ix[i])

df1.to_csv('FileA.csv', index=None, header=None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM