使用 readlines 并以某种方式跳过两个 csv 文件中比较的第三列

Question

Old.csv:老csv：

name,department
leona,IT

New.csv:新品.csv:

name,department
leona,IT
lewis,Tax

With the same two columns, finding the new values from New.csv and update Old.csv with those works fine with the code below使用相同的两列，从 New.csv 中找到新值并使用以下代码更新 Old.csv

feed = []
headers = []
   

with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
        

for header in t1.readline().split(','):
    headers.append(header.rstrip())

fileone = t1.readlines()
filetwo = t2.readlines()[1:]  # Skip csv fieldnames

for line in filetwo:

    if line not in fileone:
        
        lineItems = {}
        feed.append(line.strip())  # For old file update

New problem:新问题：

1/ Add a 3rd column to store timestamp values 1/ 添加第三列来存储时间戳值

2/ Skip the 3rd column (timestamp) in both files and still need to compare two files for differences based on the 1st and 2nd columns 2/ 跳过两个文件中的第 3 列（时间戳），仍然需要根据第 1 列和第 2 列比较两个文件的差异

3/ Old file will be updated with the new values on all 3 columns 3/ 旧文件将使用所有 3 列的新值进行更新

I tried the slicing method split(',')[0:2] but didn't seem to work at all.我尝试了切片方法 split(',')[0:2] 但似乎根本不起作用。 I feel there is just some small updates to the existing code but not sure how I can achieve that.我觉得现有代码只有一些小的更新，但不确定如何实现。

Expected outcome:预期结果：

Old.csv:老csv：

name,department,timestamp
leona,IT,07/20/2020       <--- Existing value
lewis,Tax,08/25/2020      <--- New value from New.csv

New.csv:新品.csv:

name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020

Answer 1

You can do it all yourself, but why not use the tools built in to Python?您可以自己完成这一切，但为什么不使用 Python 内置的工具呢？

from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(rec)

print(headers)
print(feed)

Result:结果：

['name', 'department']
[['lewis', 'Tax']]

Note that you'll get this result with the data you provided, but if you add a third column, the code still works as expected and will add that data to the feed result.请注意，您将使用您提供的数据获得此结果，但如果您添加第三列，代码仍会按预期工作并将该数据添加到feed结果中。

To get feed to be a list of dictionaries, which you can easily turn into JSON, you could do something like:要将 feed 作为字典列表，您可以轻松地将其转换为 JSON，您可以执行以下操作：

feed.append(dict(zip(headers, rec)))

Turning feed into json is as simple as:将 feed 变成 json 很简单：

import json

print(json.dumps(feed))

The whole solution:整个解决方案：

import json
from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(dict(zip(headers, rec)))

print(json.dumps(feed))

With outputs like:输出如下：

[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]

使用 readlines 并以某种方式跳过两个 csv 文件中比较的第三列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-17 02:21:37

使用 readlines 并以某种方式跳过两个 csv 文件中比较的第三列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-17 02:21:37

解决方案1
1 已采纳 2020-08-17 02:21:37