简体   繁体   English

Python CSV比较

[英]Python CSV Comparison

This script compares two csv files...with two columns plz help me to modify this script if sample1.csv and sample2.csv has more than 2 columns or 1 column. 此脚本比较两个csv文件...具有两列,如果sample1.csv和sample2.csv具有多于2列或1列,plz可以帮助我修改此脚本。

f1_in = open("sample1.csv","r")
next(f1_in,None)
f1_dict = {}
for line in f1_in:
  l = line.split(',')
  f1_dict[l[0]. strip()] = l[1]. strip() 
  l.sort()
f1_in.close()

f2_in = open("sample2.csv","r")
next(f2_in,None)
f2_dict = {}
for line in f2_in:
  l = line.split(',')
  f2_dict[l[0]. strip()] = l[1]. strip()
  l.sort()
f2_in.close()


f_same = open("same.txt","w")
f_different = open("different.txt","w")

for k1 in f1_dict.keys():
  if k1 in f2_dict.keys() \
      and f2_dict[k1] == f1_dict[k1]:
    f_same.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
                                    str(k1)+" "+str(f2_dict[k1])))

  elif not k1 in f2_dict.keys():
    f_different.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
                                           "------"))
  elif not f2_dict[k1] == f1_dict[k1]:
    f_different.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
                                           str(k1)+" "+str(f2_dict[k1])))

f_same.close()
f_different.close()

for eg:if my source file has Name and Salary as headers with values A 20000 B 15000 C 10000 D 10000 and target file also with Name and Salary has headers with values A 40000 D 10000 B 15000 C 10000 E 8000...my output should be Different lines:A 20000 A 40000 D 10000 -----(no file in target) -----(no file in source) E 8000 and common lines as B 15000 B 15000, C 10000 C 10000 例如:如果我的源文件的Name和Salary作为标头的值为A 20000 B 15000 C 10000 D 10000,目标文件也具有Name和Salary的标头的值为A 40000 D 10000 B 15000 C 10000 E 8000 ...我的输出应该是不同的行:A 20000 A 40000 D 10000 -----(目标中没有文件)-----(源中没有文件)E 8000和常见的行为B 15000 B 15000,C 10000 C 10000

It is no wonder you cannot expand the code to more than two colums if you are regarding your columns as key/value pairs in a dictionary. 难怪如果您将列视为字典中的键/值对,则无法将代码扩展到两个以上的列。

You have to see them as "elements in a set". 您必须将它们视为“一组元素”。 I understand this is why you are not using the csv module or the difflib module: because you do not care whether the lines appear in (nearly) same order in either file, but whether they appear at all. 我知道这就是为什么您不使用csv模块或difflib模块的原因:因为您不在乎行在两个文件中是否以(几乎)相同的顺序出现,而是根本不出现。

Here is an example: 这是一个例子:

import itertools


def compare(first_filename, second_filename):
    lines1 = set()
    lines2 = set()
    with open(first_filename, 'r') as file1, \
            open(second_filename, 'r') as file2:
        for line1, line2 in itertools.izip_longest(file1, file2):
            if line1:
                lines1.add(line1)
            if line2:
                lines2.add(line2)
    print "Different lines"
    for line in lines1 ^ lines2:
        print line,
    print "---"
    print "Common lines"
    for line in lines1 & lines2:
        print line,

Notice that this code will find differences on both files, not just things that exist on f1 but not on f2, as your example does. 请注意,此代码将在两个文件上都找到差异,而不仅仅是f1上存在的东西,而f2上没有,就像您的示例一样。 However, it is not able to tell where do differences come from (since this does not seem a requirement of the question). 但是,它无法分辨差异的来源(因为这似乎不是问题的要求)。

Check that it works 检查它是否有效

In [40]: !cat sample1.csv
bacon, eggs, mortar
whatever, however, whenever
spam, spam, spam

In [41]: !cat sample2.csv
guido, van, rossum
spam, spam, spam

In [42]: compare("sample1.csv", "sample2.csv")
Different lines
whatever, however, whenever
guido, van, rossum
bacon, eggs, mortar
---
Common lines
spam, spam, spam

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM