![](/img/trans.png)
[英]Comparing two csv files using given columns and build a third one using specific columns from the matching lines
[英]Using readlines and somehow skip the third column from comparison in two csv files
老csv:
name,department
leona,IT
新品.csv:
name,department
leona,IT
lewis,Tax
使用相同的两列,从 New.csv 中找到新值并使用以下代码更新 Old.csv
feed = []
headers = []
with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
for header in t1.readline().split(','):
headers.append(header.rstrip())
fileone = t1.readlines()
filetwo = t2.readlines()[1:] # Skip csv fieldnames
for line in filetwo:
if line not in fileone:
lineItems = {}
feed.append(line.strip()) # For old file update
新问题:
1/ 添加第三列来存储时间戳值
2/ 跳过两个文件中的第 3 列(时间戳),仍然需要根据第 1 列和第 2 列比较两个文件的差异
3/ 旧文件将使用所有 3 列的新值进行更新
我尝试了切片方法 split(',')[0:2] 但似乎根本不起作用。 我觉得现有代码只有一些小的更新,但不确定如何实现。
预期结果:
老csv:
name,department,timestamp
leona,IT,07/20/2020 <--- Existing value
lewis,Tax,08/25/2020 <--- New value from New.csv
新品.csv:
name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020
您可以自己完成这一切,但为什么不使用 Python 内置的工具呢?
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(rec)
print(headers)
print(feed)
结果:
['name', 'department']
[['lewis', 'Tax']]
请注意,您将使用您提供的数据获得此结果,但如果您添加第三列,代码仍会按预期工作并将该数据添加到feed
结果中。
要将 feed 作为字典列表,您可以轻松地将其转换为 JSON,您可以执行以下操作:
feed.append(dict(zip(headers, rec)))
将 feed 变成 json 很简单:
import json
print(json.dumps(feed))
整个解决方案:
import json
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(dict(zip(headers, rec)))
print(json.dumps(feed))
输出如下:
[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.