[英]Python or Shell assistance: How to search for a word and replace only the value next to it
[英]Python : Search on a Key and replace the next word before “,” with a constant value in a very large windows file
我最近才开始学习Python,并提出了需要您帮助的要求。 我有大型机背景,这是一个非常简单的要求,可以使用DFSORT来完成,但是在python中,我搜索了论坛和Google,但找不到此问题的任何线索。
我有一个很大的Windows文件,它可以是3GB到5GB甚至更大。 我的要求是在每行中用一个关键字搜索该文件,如果找到该关键字,则用XXXXXXXXXX替换(结束)“,”之前的下一个单词,关键字始终为“ name:”,要替换的值始终位于( ,)之后的键。 可能并非所有的行都有键。 如果要替换的值为NULL,则必须从替换中忽略该值
this is the name: roger,who won australian open
yes name: rafael nadal,who won french open
name: novak, is injured for this season
propably greatest of all time name: roger, had won wimbledon again.
this is the name: NULL,who will win US open !!!
this is the name: XXXXXXXXXX,who won australian open
yes name: XXXXXXXXXX,who won french open
name: XXXXXXXXXX, is injured for this season
propably greatest of all time name: XXXXXXXXXX, had won wimbledon again.
this is the name: NULL,who will win US open !!!
您可以使用正则表达式捕获name: anysequenceofcharacters,
字符串,并将其替换为name: XXXXXXXXXX,
name: anysequenceofcharacters,
import re
with open('in', "rt") as fin:
with open('out', "wt") as fout:
for line in fin:
fout.write(re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', line))
无法发表评论,因此这是在@aoiee的基础上建立的答案,该答案将遍历文件:
with open('filename.txt', 'r') as f:
lines = file.read()
text = re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', lines)
with open('out.txt', 'w') as out:
out.write(text)
除了aoiee的答案,您还可以阅读并重写文字,
如果数据很多,可能需要更长的时间
import fileinput
import re
with open('path to file.txt or whatever', 'r') as file :
filedata = file.read()
new_data = re.sub('name:([^,]+),', 'name: XXXXXXXXXX,', filedata)
with open('path to file.txt or whatever', 'w') as file:
file.write(new_data)
如果每一行都在换行符上,则可以执行以下操作而无需for循环:
def repel(mo):
if mo.group(3) == 'NULL':
return '{}{}{}{}'.format(mo.group(1), mo.group(2), mo.group(3), mo.group(4))
return '{}{}{}{}'.format(mo.group(1), mo.group(2), 'XXXX,', mo.group(4))
pattern = re.compile('(.*)(name: )(\w+,)?(.*)')
re.sub(pattern, repel, _in, re.DOTALL)
基本上在每次比赛时都调用排斥功能,该功能将name:
后的部分替换为XXXX
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.