使用file1中的數據更新file2中的記錄

Question

有一個固定格式的大文件，file1。 另一個CSV文件file2具有ID和值，使用它們需要更新在file1中具有相同ID的記錄的特定部分。 這是我的嘗試。 我非常感謝您可以提供幫助以完成此工作。

file2逗號分隔

clr,code,type
Red,1001,1
Red,2001,2
Red,3001,3
blu,1002,1
blu,2002,2
blu,3002,3

file1（固定寬度格式）

clrtyp1typ2typ3notes
red110121013101helloworld
blu110221023102helloworld2

file1需要更新為以下內容

clrtyp1typ2typ3notes
red100120013001helloworld
blu100220023002helloworld2

請注意，這兩個文件都是相當大的文件（每個文件有多個GB）。 我是python noob，請原諒任何重大錯誤。 非常感謝您能提供的任何幫助。

import shutil
#read both input files
file1=open("file1.txt",'r').read()
file2='file2.txt'

#make a copy of the input file to make edits to it. 
file2Edit=file2+'.EDIT'
shutil.copy(file2, baseEdit)
baseEditFile = open(baseEdit,'w').read()

#go thru eachline, pick clr from file1 and look for it in file2, if found, form a string to be replaced and replace the original line. 
with open('file2.txt','w') as f:
    for line in f:
        base_clr = line[:3]
        findindex = file1.find(base_recid)
        if findindex != -1:
            for line2 in file1:
                #print(line)
                clr = line2.split(",")[0]
                code = line2.split(",")[1]
                type = line2.split(",")[2]
                if keytype = 1:
                    finalline=line[:15]+string.rjust(keyid, 15)+line[30:]
                    baseEditFile.write( replace(line,finalline)
                    baseEditFile.replace(line,finalline)

Answer 1

如果我說對了，您需要這樣的東西：

# declare file names and necessary lists
file1 = "file1.txt"
file2 = "file2.txt"
file1_new = "file1.txt.EDIT"
clrs = {}

# read clrs to update
with open(file1, "r") as f:
    # skip header line
    f.next()
    for line in f:
        clrs[line[:3]] = []

# read the new codes
with open(file2, "r") as f:
    # skip header
    f.next()
    for line in f:
        current = line.strip().split(",")
        key = current[0].lower()
        if key in clrs:
            clrs[key].append(current[1])

# write the new lines (old codes replaced with the new ones) to new file
with open(file1, "r") as f_in:
    with open(file1_new, "w") as f_out:
        # writes header
        f_out.write(f_in.next())
        for line in f_in:
            line_new = list(line)
            key = line[:3]
            # checks if new codes were found for that key
            if key in clrs.keys():
                # replaces old keys by the new keys
                line_new[3:15] = "".join(clrs[key])
            f_out.write("".join(line_new))

這僅適用於給定的示例。 如果文件具有其他實際使用的格式，則必須調整使用的索引。

這個小腳本首先打開您的file1，對其進行迭代，然后將clr作為密鑰添加到字典中。 該鍵的值是一個空列表。 然后，它打開file2，並在這里遍歷每個clr。 如果clr在字典中，它將代碼附加到列表中。 因此，運行此部分后，詞典將包含鍵，值對，其中鍵是clr，值是包含代碼的列表（按文件給出的順序）。

在腳本的最后一部分，將file1.txt的每一行寫入file1.txt.EDIT。 在編寫之前，舊代碼將被新代碼替換。

保存在file2.txt中的代碼必須與保存在file1.txt中的代碼相同。 如果順序可以不同，或者file2.txt中的代碼可能比在file1.txt中需要替換的代碼更多，則需要添加查詢以檢查正確的代碼。 沒什么大不了的，但是此腳本將解決您作為示例提供給我們的文件的問題。

如果您有任何疑問或需要更多幫助，請隨時提問。

編輯：除了您在問題代碼中進行的一些語法錯誤和錯誤的方法調用之外，您不應該一次讀取保存在文件中的全部數據，尤其是如果您知道文件會變得非常大。 這會占用大量內存，並可能導致程序運行非常緩慢。 這就是為什么逐行迭代要好得多的原因。 我提供的示例一次只讀取文件的一行並將其直接寫入新文件，而不是將舊文件和新文件都保存在內存中並作為最后一步將其寫入。

使用file1中的數據更新file2中的記錄

問題描述

1 個解決方案

解決方案1
2 已采納 2016-06-06 12:06:24

使用file1中的數據更新file2中的記錄

問題描述

1 個解決方案

解決方案1 2 已采納 2016-06-06 12:06:24

解決方案1
2 已采納 2016-06-06 12:06:24