簡體   English   中英

如何使用python在大文件中的兩個重復模式之間grep行

[英]How to grep lines between two repeating patterns in a big file with python

我試圖從txt文件中以恆定模式“ beginPattern”和“ endPattern”提取,它們之間只有line1和line2中的鍵值是索引,然后才能在任何行中找到要提取的gettting值(key = value;)

[BEGIN_PATTERN]
    line1=abd;
    line2=ZXY;
    ...
    line43=454; 
    ...
    ...
[END_PATTERN]
[BEGIN_PATTERN]
    line1=abc;
    line2=ZXC;
    ...
    line72=847;
    ...
[END_PATTERN]
[BEGIN_PATTERN]
    line1=abe;
    line2=ZXV;
    ...
    line33=135;
    ...
[END_PATTERN]
[BEGIN_PATTERN]
    line1=abt;
    line2=ZXF;
    ...
    line54=734;
    ...
[END_PATTERN]

預期結果是:

abd,ZXY,aaa,454,ggg,ggs
abc,ZXC,mgf,847,jde,g3e
abe,ZXV,ytd,135,dfs,jhf
abt,ZXF,ytf,734,ytd,hge

我嘗試使用python腳本和re.match ,它僅在輸出文件中讀取和寫入值abd,ZXY到找到的第一個beginPattern和endPattern中

import re

START_PATTERN = '<BEGIN'
END_PATTERN = '<BEND'

with open('DB_example.txt') as file:
    match = False
    newfile = None

    for line in file:
        if re.match(START_PATTERN, line):
            match = True
            newfile = open('my_new_file.txt', 'w')
            continue
        elif re.match(END_PATTERN, line):
            match = False
            newfile.close()
            continue
        elif match:
            #remove TAB and BreakLine
            valor=line.rstrip().replace('\t','')
            #split Key and value
            (key, val) = valor.split('=')
            if re.match('line1',key):
                match = True
                #before write into file remove ";"
                newfile.write(val.replace(';',''))
                continue
            elif re.match('line2',key):
                match:False
                newfile.write(','+val.replace(';', ''))
                continue
            elif re.match('lineXX',key):
                match:False
                newfile.write(','+val.replace(';', ''))
                continue
            elif re.match('lineYY',key):
                match:False
                newfile.write(','+val.replace(';', ''))
                continue

它不會繼續使用第二,第三和其他模式。 我的文件至少有30萬個匹配項。 我感謝您的幫助。

每次打開文件時,寫入后都將關閉文件。 因此,打開文件后,每次newfile.write都會覆蓋前一個。

如果要將新的val添加到文件,請嘗試在寫入任何內容之前僅打開一次文件,並在寫入所有值之后關閉文件。

import re

START_PATTERN = '<BEGIN'
END_PATTERN = '<BEND'
newfile = open('my_new_file.txt', 'w')
with open('DB_example.txt') as file:
    match = False
    for line in file:
        if re.match(START_PATTERN, line):
            match = True
            continue
        elif re.match(END_PATTERN, line):
            match = False
            continue
        elif match:
            #remove TAB and BreakLine
            valor=line.rstrip().replace('\t','')
            #split Key and value
            (key, val) = valor.split('=')
            if re.match('line1',key):
                match = True
                #before write into file remove ";"
                newfile.write(val.replace(';',''))
                continue
            elif re.match('line2',key):
                match:False
                newfile.write(','+val.replace(';', ''))
                continue
            elif re.match('lineXX',key):
                match:False
                newfile.write(','+val.replace(';', ''))
                continue
            elif re.match('lineYY',key):
                match:False
                newfile.write(','+val.replace(';', ''))
                continue

newfile.close()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM