简体   繁体   English

比较 python 中的两个 csv 文件并跳过给定的行号

[英]Compare two csv files in python and skip the given row number

I am new to python我是 python 的新手

I was trying to read csv files and check any difference also to skip the second row of both files我试图阅读 csv 文件并检查任何差异以跳过两个文件的第二行

I started something like this我开始了这样的事情

  import sys
  def csv_diff(file_f,file_g):
      #file_f = sys.argv[1]
      #file_g = sys.argv[2]
      set_f = set()
      set_g = set()
      with open(file_f) as f:
          line = f.readline().strip()
          while line:
              set_f.add(line)
              line = f.readline().strip()
      with open(file_g) as g:
          line = g.readline().strip()
          while line:
              set_g.add(line)
              line = g.readline().strip()
      diff = set_f - set_g

      # print set_f
      # print set_g
      # print diff
      if diff:
          #print "Data mismatch between the files"
          return False
      else:
          #print " Data Matches "
          return True

But this code not reading the first line但是这段代码没有读到第一行

My csv file我的 csv 文件

File Name : man.csv
Start Time : 2017-02-17T09:46:50
Read Count : 1
Write Count : 0
Filter Count : 0
Skip Count : 1

I am looking to skip the line: Start Time: 2017-02-17T09:46:50我想跳过这条线:开始时间:2017-02-17T09:46:50

Any easy and better approach?有什么简单更好的方法吗?

You can try the following if your csv has many entries and you want to always skip Start Time .如果您的csv有很多条目并且您希望始终跳过Start Time ,您可以尝试以下操作。 This will also work if your csv as only 1 entry as well.如果您的csv也只有 1 个条目,这也将起作用。

def csv_diff(file_1, file_2):
    with open(file_1, "r") as f1, open(file_2, "r") as f2:
        for line1, line2 in zip(f1, f2):
            if line1.startswith("Start Time"):
                continue
            if line1.strip() != line2.strip():
                print(f"The two files '{file_1}' and '{file_2}' do not match!")
                return False
    print(f"The two files '{file_1}' and '{file_2}' are a match!")
    return True

Why not just add something simple like:为什么不添加一些简单的东西,例如:

if not "Start Time" in line:
    set_g.add(line)
    line = g.readline().strip()
    

For each file, you can use readlines() to read all lines, pop out index 1 and convert it to a set, then see if the sets are equal.对于每个文件,可以使用 readlines() 读取所有行,弹出索引 1 并将其转换为集合,然后查看集合是否相等。

def csv_diff(file_f,file_g):
    with open(file_f) as f:
        textf = f.readlines()
        textf.pop(1)
        set_f = set(textf)
    with open(file_g) as g:
        textg = g.readlines()
        textg.pop(1)
        set_g = set(textg)
    if set_f == set_g:
        return True
    return False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM