简体   繁体   English

PYTHON-比较两个不同的文件; 如果包含ID的行已经存在,请更新,否则追加该行

[英]PYTHON - Comparing two different files; if a line containing an ID already exists then update else append the line

I have a master file which is the one which will constantly be updated and a file which is created every minute. 我有一个主文件,该文件将不断更新,并且每分钟创建一个文件。 I want to be able to compare the new file which is created every minute to the already existing master file. 我希望能够将每分钟创建的新文件与现有的主文件进行比较。 So far I've got: 到目前为止,我已经:

with open("jobs") as a:
   new = a.readlines()

count=0
for item in new:
   new[count]=new[count].split(",")
   count+=1

This will allow me to compare the first index([0] of each line in my master file. Now at this point I start to confuse myself. I'm guessing it would be something along the lines of: 这将使我能够比较主文件中每行的第一个索引([0]。现在,我开始感到困惑。我猜测这可能与以下内容类似:

counter=0
for item in new:
    if new[counter][0] not in master:
        end = open("end","a")
        end.write(str(new[counter]) + "\n")
        counter+=1
        end.close()
    else:
         REPLACE LINES THAT ALREADY EXIST IN MASTER FILE WITH NEW LINE

The IDs won't necessarily be in the same order every time the new file comes in and the new file may contains more entries than the master file at some point. 每次进入新文件时,ID的顺序不一定相同,并且有时新文件包含的条目可能比主文件多。

If I haven't made sense or missed some information out then please let me know and I'll try and clarify. 如果我没有道理或错过了一些信息,请告诉我,我将尽力澄清。 Thanks. 谢谢。

Sounds like a csv problem to me. 在我看来,这是一个csv问题。

unfortunately, it is not clear from your question, if you want to modify the masterfile itself, an out-file, or both. 不幸的是,您是否想修改masterfile本身,out-file还是同时修改这两者,从您的问题中并不清楚。 this does the second (it takes a masterfile and an updatefile, both in csv format, and prints the merged thing unsorted to an out-file). 这将执行第二个操作(它需要一个csv格式的主文件和一个更新文件,并将未排序的合并内容打印到输出文件中)。 If this is not what you want, or if you got data comma-seperated, but without fieldnames on top, change as you need, should be easy enough. 如果这不是您想要的,或者如果您用逗号分隔了数据,但是没有最上面的字段名,那么根据需要进行更改就应该足够容易了。

import csv
with open("master.csv") as m, open("update.csv") as u, open("out.csv", "w") as o:
    master = { line['ID']: line for line in csv.DictReader(m) }
    update = { line['ID']: line for line in csv.DictReader(u) }
    master.update(update)
    fields = csv.DictReader(open("master.csv")).fieldnames
    out = csv.DictWriter(o, fields)
    out.writeheader()
    out.writerows(master.values())

with master.csv as such: 与master.csv这样的:

ID,Name,Foo,Bar,Baz,Description
1000001,Name here:1,1001,1,description here
1000002,Name here:2,1002,2,description here
1000003,Name here:3,1003,3,description here
1000004,Name here:4,1004,4,description here
1000005,Name here:5,1005,5,description here
1000006,Name here:6,1006,6,description here
1000007,Name here:7,1007,7,description here
1000008,Name here:8,1008,8,description here
1000009,Name here:9,1009,9,description here

and update.csv as such: 并这样更新:

ID,Name,Foo,Bar,Baz,Description
1000003,UPDATED Name here:3,1003,3, UPDATED description here
1000010,NEW ITEM Name here:9,1009,9,NEW ITEM description here 

it outputs to out.csv: 它输出到out.csv:

ID,Name,Foo,Bar,Baz,Description
1000010,NEW ITEM Name here:9,1009,9,NEW ITEM description here ,
1000008,Name here:8,1008,8,description here,
1000009,Name here:9,1009,9,description here,
1000006,Name here:6,1006,6,description here,
1000007,Name here:7,1007,7,description here,
1000004,Name here:4,1004,4,description here,
1000005,Name here:5,1005,5,description here,
1000002,Name here:2,1002,2,description here,
1000003,UPDATED Name here:3,1003,3, UPDATED description here,
1000001,Name here:1,1001,1,description here,

Note that the order is not preserved (not clear from question if neccessary). 请注意,订单不会保留(如有必要,请从问题中清除)。 But it is fast and clean. 但是它又快又干净。

Maybe something like this will work: 也许这样的事情会起作用:

#First create a set of all the ids contained in a masterfile
master_set = set()
with open('masterfile.txt') as mf:

    for ele in mf:
        master_set.add(ele.split(',')[0])

#if id is not in masterfile (or set) append the line to masterfile
with open('tempfile.txt') as temp, open('masterfile.txt', 'a') as mf:
    for line in temp:
        index = line.split(',')[0]
        if not index in master_set:
            master_set.add(index)
            mf.write(line)

I have not tested it. 我还没有测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM