简体   繁体   中英

Python : Search on a Key and replace the next word before “,” with a constant value in a very large windows file

I have started learning Python pretty recently and came up with a requirement for which I need your help. I have a mainframe background and this is pretty simple requirement which can be done using DFSORT but in python i have searched forum and google but couldn't find any clue for this problem.

I have a large windows file which can be of 3GB to 5GB or even more. My requirement is to search this file with a key in each line and if the key is found replace the next word before (ending) "," with XXXXXXXXXX the key is always "name:" and the value to be replaced is always before (,) following key. It might be possible that not all lines would have the key. If the value to be replaced is NULL then that value had to be ignored from replace

Sample Input File :-

this is the name: roger,who won australian open
yes name: rafael nadal,who won french open
name: novak, is injured for this season
propably greatest of all time name: roger, had won wimbledon again.
this is the name: NULL,who will win US open !!!

Output file

this is the name: XXXXXXXXXX,who won australian open
yes name: XXXXXXXXXX,who won french open
name: XXXXXXXXXX, is injured for this season
propably greatest of all time name: XXXXXXXXXX, had won wimbledon again.
this is the name: NULL,who will win US open !!!

You can use regex to capture the name: anysequenceofcharacters, string and replace it with name: XXXXXXXXXX, :

import re
with open('in', "rt") as fin:
with open('out', "wt") as fout:
    for line in fin:
        fout.write(re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', line))

Can't comment, so here's an answer building off of @aoiee's that will traverse the file:

with open('filename.txt', 'r') as f:
    lines = file.read()
    text = re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', lines)

with open('out.txt', 'w') as out:
    out.write(text)

addition to aoiee´s answer you can read text and rewrite it,

it might take longer if there is a lot data

import fileinput
import re
with open('path to file.txt or whatever', 'r') as file :
  filedata = file.read()
new_data = re.sub('name:([^,]+),', 'name: XXXXXXXXXX,', filedata)
with open('path to file.txt or whatever', 'w') as file:
  file.write(new_data)

In case the each row is on a new line, you can do the following without a for loop:

def repel(mo):
    if mo.group(3) == 'NULL':
        return '{}{}{}{}'.format(mo.group(1), mo.group(2), mo.group(3), mo.group(4))
    return '{}{}{}{}'.format(mo.group(1), mo.group(2), 'XXXX,', mo.group(4))

pattern = re.compile('(.*)(name: )(\w+,)?(.*)')
re.sub(pattern, repel, _in, re.DOTALL)

This basically call repel function on every match, which replaces the part after name: with XXXX .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM