简体   繁体   English

使用 readlines 并以某种方式跳过两个 csv 文件中比较的第三列

[英]Using readlines and somehow skip the third column from comparison in two csv files

Old.csv:老csv:

name,department
leona,IT

New.csv:新品.csv:

name,department
leona,IT
lewis,Tax

With the same two columns, finding the new values from New.csv and update Old.csv with those works fine with the code below使用相同的两列,从 New.csv 中找到新值并使用以下代码更新 Old.csv

feed = []
headers = []
   

with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
        

for header in t1.readline().split(','):
    headers.append(header.rstrip())

fileone = t1.readlines()
filetwo = t2.readlines()[1:]  # Skip csv fieldnames

for line in filetwo:

    if line not in fileone:
        
        lineItems = {}
        feed.append(line.strip())  # For old file update
        

New problem:新问题:

1/ Add a 3rd column to store timestamp values 1/ 添加第三列来存储时间戳值

2/ Skip the 3rd column (timestamp) in both files and still need to compare two files for differences based on the 1st and 2nd columns 2/ 跳过两个文件中的第 3 列(时间戳),仍然需要根据第 1 列和第 2 列比较两个文件的差异

3/ Old file will be updated with the new values on all 3 columns 3/ 旧文件将使用所有 3 列的新值进行更新

I tried the slicing method split(',')[0:2] but didn't seem to work at all.我尝试了切片方法 split(',')[0:2] 但似乎根本不起作用。 I feel there is just some small updates to the existing code but not sure how I can achieve that.我觉得现有代码只有一些小的更新,但不确定如何实现。

Expected outcome:预期结果:

Old.csv:老csv:

name,department,timestamp
leona,IT,07/20/2020       <--- Existing value
lewis,Tax,08/25/2020      <--- New value from New.csv

New.csv:新品.csv:

name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020

You can do it all yourself, but why not use the tools built in to Python?您可以自己完成这一切,但为什么不使用 Python 内置的工具呢?

from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(rec)

print(headers)
print(feed)

Result:结果:

['name', 'department']
[['lewis', 'Tax']]

Note that you'll get this result with the data you provided, but if you add a third column, the code still works as expected and will add that data to the feed result.请注意,您将使用您提供的数据获得此结果,但如果您添加第三列,代码仍会按预期工作并将该数据添加到feed结果中。

To get feed to be a list of dictionaries, which you can easily turn into JSON, you could do something like:要将 feed 作为字典列表,您可以轻松地将其转换为 JSON,您可以执行以下操作:

feed.append(dict(zip(headers, rec)))

Turning feed into json is as simple as:将 feed 变成 json 很简单:

import json

print(json.dumps(feed))

The whole solution:整个解决方案:

import json
from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(dict(zip(headers, rec)))

print(json.dumps(feed))

With outputs like:输出如下:

[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用给定的列比较两个csv文件,并使用匹配行中的特定列来构建第三个 - Comparing two csv files using given columns and build a third one using specific columns from the matching lines 比较两个csv文件中的列并提取第三个csv文件中的匹配列 - compare columns in two csv files and extract the matched column in a third csv file 使用列值进行CSV比较 - CSV comparison using a column value 使用numpy比较两个文本文件中的两列 - Comparison on two columns from two text files using numpy 跳过 CSV 文件中特定列的 Python 脚本 - Python script to skip specific column in CSV files 如何比较两个 csv 文件中的特定列并将差异输出到第三个文件 - How to compare a specific column in two csv files and output differences to a third file 使用两个 CSV 文件,从一个文件中获取列值并在另一个文件中使用它 - Working with two CSV files, taking column values from one and using it on the other 根据第三列中的值使用python比较CSV文件中的两列 - Comparing two columns in a csv file using python based on value in third column 仅当使用Python 3.3在特定范围内时,才在比较两个csv文件后打印值 - Printing values following comparison of two csv files only if in a specific range using Python 3.3 Pandas:将列名添加到多个 csv 文件的第三列 - Pandas: Add column name to third column for multiple csv files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM