簡體   English   中英

在文件中替換的行集-python

[英]Set of lines to replace in file - python

我是python的新手。 我正在嘗試使用具有新數據(newprops)的文件替換第二個文件上的舊數據。 兩個文件都超過3MB。

包含新數據的文件如下所示:

PROD    850 30003   0.096043  
PROD    851 30003   0.096043  
PROD    853 30003   0.096043  
PROD    852 30003   0.096043  
....

具有舊數據的原始文件類似於:

CROD    850     123456 123457 123458 123459  
PROD    850     30003   0.08  
CROD    851     123456 123457 123458 123459  
PROD    851     30003   0.07  
CROD    852     123456 123457 123458 123459  
PROD    852     30003   0.095  
CROD    853     123456 123457 123458 123459  
PROD    853     30003   0.095  
....

輸出應為:

CROD    850     123456 123457 123458 123459  
PROD    850     30003   0.096043  
CROD    851     123456 123457 123458 123459  
PROD    851     30003   0.096043  
CROD    852     123456 123457 123458 123459  
PROD    852     30003   0.096043  
CROD    853     123456 123457 123458 123459  
PROD    853     30003   0.096043  

這是我到目前為止的內容:

import fileinput

def prop_update(newprops,bdffile):

    fnewprops=open(newprops,'r')
    fbdf=open(bdffile,'r+')
    newpropsline=fnewprops.readline()
    fbdfline=fbdf.readline()


    while len(newpropsline)>0:
        fbdf.seek(0)
        propname=newpropsline.split()[1]
        propID=newpropsline.split()[2]
            while len(fbdfline)>0:
                if propID and propname in fbdfline:
                    bdffile.write(newpropsline) #i'm stuck here... I want to delete the old line and use updated value                   
                else:                    
                    fbdfline=fbdfline.readline()

        newpropsline=fnewprops.readline()

    fnewprops.close()

請幫忙!

您可以從原始文件中獲取第二行,並用新行壓縮它們,然后重新打開原始文件並寫入更新的行,假定新行的長度等於或等於原始行的一半:

from itertools import izip

with open("new.txt") as f,open("orig.txt") as f2:
    lines = f2.readlines()
    zipped = izip(lines[::2],f) # just use zip for python3
    with open("orig.txt","w") as out:
        for pair in zipped:
            out.writelines(pair)

如果要基於第二列對行進行排序,則還需要手動剝離和插入換行符,以便最后的行分開:

from itertools import izip,islice

with open("new.txt") as f, open("orig.txt") as f2:
    orig = sorted((x.strip() for x in islice(f2, 0, None, 2)), key=lambda x: int(x.split(None, 2)[1]))
    new = sorted((x.strip() for x in f), key=lambda x:int(x.split(None,2)[1]))
    zipped = izip(orig, new)
    with open("orig.txt","w") as out:
        for pair in zipped:
            out.write("{}\n{}\n".format(*pair))

輸出:

CROD 850 123456 123457 123458 123459
PROD 850 30003 0.096043
CROD 851 123456 123457 123458 123459
PROD 851 30003 0.096043
CROD 852 123456 123457 123458 123459
PROD 852 30003 0.096043
CROD 853 123456 123457 123458 123459
PROD 853 30003 0.096043

如果長度不一樣,則可以使用填充值為"" itertools.izip_longest ,這樣就不會丟失任何數據:

如果舊文件已經整理好,只需忘記對f2進行排序即可,並使用f2.readlines()[::2]但如果順序不正確,則可以確保所有行都基於第二列進行了排序,無論原始訂單。

您可以使用字典來索引新數據。 然后,將原始文件逐行寫入新文件,並隨即更新索引中的數據。 看起來前三個項目應該是鍵(“ PROD 850 30003”),並且可以使用諸如(PROD\\s+\\d+\\s+\\d+)類的正則表達式將其拉出。

import re
_split_new = re.compile(r"(PROD\s+\d+\s+\d+)(.*)")

# create an index for the PROD items to be updated

# this might be a bit more understandable...
#with open('updates.txt') as updates:
#    new_data = {}
#    for line in updates:
#        match = _split_new.match(line)
#        if match:
#            key, value = match.groups()
#            new_data[key] = value

# ... but this is fancier (and likely faster)
with open('updates.txt') as updates:
    new_data = dict(match.groups() 
        for match in (_split_new.search(line) for line in updates)
        if match)

# then process the updates
with open('origstuff.txt') as orig, open('newstuff.txt', 'w') as newstuff:
    # for each line in the original...
    for line in orig:
        match = _split_new.match(line)
        # ... see if its a PROD line
        if match:
            key, value = match.groups()
            # ... and rewrite with value from indexing dict (defaulting to current value)
            newstuff.write("%s%s\n" % (key, new_data.get(key, value)))
        else:
            # ... or just the original line
            newstuff.write(line)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM