[英]Using readlines and somehow skip the third column from comparison in two csv files
Old.csv:老csv:
name,department
leona,IT
New.csv:新品.csv:
name,department
leona,IT
lewis,Tax
With the same two columns, finding the new values from New.csv and update Old.csv with those works fine with the code below使用相同的两列,从 New.csv 中找到新值并使用以下代码更新 Old.csv
feed = []
headers = []
with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
for header in t1.readline().split(','):
headers.append(header.rstrip())
fileone = t1.readlines()
filetwo = t2.readlines()[1:] # Skip csv fieldnames
for line in filetwo:
if line not in fileone:
lineItems = {}
feed.append(line.strip()) # For old file update
New problem:新问题:
1/ Add a 3rd column to store timestamp values 1/ 添加第三列来存储时间戳值
2/ Skip the 3rd column (timestamp) in both files and still need to compare two files for differences based on the 1st and 2nd columns 2/ 跳过两个文件中的第 3 列(时间戳),仍然需要根据第 1 列和第 2 列比较两个文件的差异
3/ Old file will be updated with the new values on all 3 columns 3/ 旧文件将使用所有 3 列的新值进行更新
I tried the slicing method split(',')[0:2] but didn't seem to work at all.我尝试了切片方法 split(',')[0:2] 但似乎根本不起作用。 I feel there is just some small updates to the existing code but not sure how I can achieve that.我觉得现有代码只有一些小的更新,但不确定如何实现。
Expected outcome:预期结果:
Old.csv:老csv:
name,department,timestamp
leona,IT,07/20/2020 <--- Existing value
lewis,Tax,08/25/2020 <--- New value from New.csv
New.csv:新品.csv:
name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020
You can do it all yourself, but why not use the tools built in to Python?您可以自己完成这一切,但为什么不使用 Python 内置的工具呢?
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(rec)
print(headers)
print(feed)
Result:结果:
['name', 'department']
[['lewis', 'Tax']]
Note that you'll get this result with the data you provided, but if you add a third column, the code still works as expected and will add that data to the feed
result.请注意,您将使用您提供的数据获得此结果,但如果您添加第三列,代码仍会按预期工作并将该数据添加到feed
结果中。
To get feed to be a list of dictionaries, which you can easily turn into JSON, you could do something like:要将 feed 作为字典列表,您可以轻松地将其转换为 JSON,您可以执行以下操作:
feed.append(dict(zip(headers, rec)))
Turning feed into json is as simple as:将 feed 变成 json 很简单:
import json
print(json.dumps(feed))
The whole solution:整个解决方案:
import json
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(dict(zip(headers, rec)))
print(json.dumps(feed))
With outputs like:输出如下:
[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.