繁体   English   中英

使用 readlines 并以某种方式跳过两个 csv 文件中比较的第三列

[英]Using readlines and somehow skip the third column from comparison in two csv files

老csv:

name,department
leona,IT

新品.csv:

name,department
leona,IT
lewis,Tax

使用相同的两列,从 New.csv 中找到新值并使用以下代码更新 Old.csv

feed = []
headers = []
   

with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
        

for header in t1.readline().split(','):
    headers.append(header.rstrip())

fileone = t1.readlines()
filetwo = t2.readlines()[1:]  # Skip csv fieldnames

for line in filetwo:

    if line not in fileone:
        
        lineItems = {}
        feed.append(line.strip())  # For old file update
        

新问题:

1/ 添加第三列来存储时间戳值

2/ 跳过两个文件中的第 3 列(时间戳),仍然需要根据第 1 列和第 2 列比较两个文件的差异

3/ 旧文件将使用所有 3 列的新值进行更新

我尝试了切片方法 split(',')[0:2] 但似乎根本不起作用。 我觉得现有代码只有一些小的更新,但不确定如何实现。

预期结果:

老csv:

name,department,timestamp
leona,IT,07/20/2020       <--- Existing value
lewis,Tax,08/25/2020      <--- New value from New.csv

新品.csv:

name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020

您可以自己完成这一切,但为什么不使用 Python 内置的工具呢?

from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(rec)

print(headers)
print(feed)

结果:

['name', 'department']
[['lewis', 'Tax']]

请注意,您将使用您提供的数据获得此结果,但如果您添加第三列,代码仍会按预期工作并将该数据添加到feed结果中。

要将 feed 作为字典列表,您可以轻松地将其转换为 JSON,您可以执行以下操作:

feed.append(dict(zip(headers, rec)))

将 feed 变成 json 很简单:

import json

print(json.dumps(feed))

整个解决方案:

import json
from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(dict(zip(headers, rec)))

print(json.dumps(feed))

输出如下:

[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM