繁体   English   中英

Python:搜索键,并在很大的Windows文件中用恒定值替换“,”之前的下一个单词

[英]Python : Search on a Key and replace the next word before “,” with a constant value in a very large windows file

我最近才开始学习Python,并提出了需要您帮助的要求。 我有大型机背景,这是一个非常简单的要求,可以使用DFSORT来完成,但是在python中,我搜索了论坛和Google,但找不到此问题的任何线索。

我有一个很大的Windows文件,它可以是3GB到5GB甚至更大。 我的要求是在每行中用一个关键字搜索该文件,如果找到该关键字,则用XXXXXXXXXX替换(结束)“,”之前的下一个单词,关键字始终为“ name:”,要替换的值始终位于( ,)之后的键。 可能并非所有的行都有键。 如果要替换的值为NULL,则必须从替换中忽略该值

样本输入文件:-

this is the name: roger,who won australian open
yes name: rafael nadal,who won french open
name: novak, is injured for this season
propably greatest of all time name: roger, had won wimbledon again.
this is the name: NULL,who will win US open !!!

输出文件

this is the name: XXXXXXXXXX,who won australian open
yes name: XXXXXXXXXX,who won french open
name: XXXXXXXXXX, is injured for this season
propably greatest of all time name: XXXXXXXXXX, had won wimbledon again.
this is the name: NULL,who will win US open !!!

您可以使用正则表达式捕获name: anysequenceofcharacters,字符串,并将其替换为name: XXXXXXXXXX, name: anysequenceofcharacters,

import re
with open('in', "rt") as fin:
with open('out', "wt") as fout:
    for line in fin:
        fout.write(re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', line))

无法发表评论,因此这是在@aoiee的基础上建立的答案,该答案将遍历文件:

with open('filename.txt', 'r') as f:
    lines = file.read()
    text = re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', lines)

with open('out.txt', 'w') as out:
    out.write(text)

除了aoiee的答案,您还可以阅读并重写文字,

如果数据很多,可能需要更长的时间

import fileinput
import re
with open('path to file.txt or whatever', 'r') as file :
  filedata = file.read()
new_data = re.sub('name:([^,]+),', 'name: XXXXXXXXXX,', filedata)
with open('path to file.txt or whatever', 'w') as file:
  file.write(new_data)

如果每一行都在换行符上,则可以执行以下操作而无需for循环:

def repel(mo):
    if mo.group(3) == 'NULL':
        return '{}{}{}{}'.format(mo.group(1), mo.group(2), mo.group(3), mo.group(4))
    return '{}{}{}{}'.format(mo.group(1), mo.group(2), 'XXXX,', mo.group(4))

pattern = re.compile('(.*)(name: )(\w+,)?(.*)')
re.sub(pattern, repel, _in, re.DOTALL)

基本上在每次比赛时都调用排斥功能,该功能将name:后的部分替换为XXXX

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM