簡體   English   中英

Python:搜索鍵,並在很大的Windows文件中用恆定值替換“,”之前的下一個單詞

[英]Python : Search on a Key and replace the next word before “,” with a constant value in a very large windows file

我最近才開始學習Python,並提出了需要您幫助的要求。 我有大型機背景,這是一個非常簡單的要求,可以使用DFSORT來完成,但是在python中,我搜索了論壇和Google,但找不到此問題的任何線索。

我有一個很大的Windows文件,它可以是3GB到5GB甚至更大。 我的要求是在每行中用一個關鍵字搜索該文件,如果找到該關鍵字,則用XXXXXXXXXX替換(結束)“,”之前的下一個單詞,關鍵字始終為“ name:”,要替換的值始終位於( ,)之后的鍵。 可能並非所有的行都有鍵。 如果要替換的值為NULL,則必須從替換中忽略該值

樣本輸入文件:-

this is the name: roger,who won australian open
yes name: rafael nadal,who won french open
name: novak, is injured for this season
propably greatest of all time name: roger, had won wimbledon again.
this is the name: NULL,who will win US open !!!

輸出文件

this is the name: XXXXXXXXXX,who won australian open
yes name: XXXXXXXXXX,who won french open
name: XXXXXXXXXX, is injured for this season
propably greatest of all time name: XXXXXXXXXX, had won wimbledon again.
this is the name: NULL,who will win US open !!!

您可以使用正則表達式捕獲name: anysequenceofcharacters,字符串,並將其替換為name: XXXXXXXXXX, name: anysequenceofcharacters,

import re
with open('in', "rt") as fin:
with open('out', "wt") as fout:
    for line in fin:
        fout.write(re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', line))

無法發表評論,因此這是在@aoiee的基礎上建立的答案,該答案將遍歷文件:

with open('filename.txt', 'r') as f:
    lines = file.read()
    text = re.sub('name:(?! NULL)([^,]+),', 'name: XXXXXXXXXX,', lines)

with open('out.txt', 'w') as out:
    out.write(text)

除了aoiee的答案,您還可以閱讀並重寫文字,

如果數據很多,可能需要更長的時間

import fileinput
import re
with open('path to file.txt or whatever', 'r') as file :
  filedata = file.read()
new_data = re.sub('name:([^,]+),', 'name: XXXXXXXXXX,', filedata)
with open('path to file.txt or whatever', 'w') as file:
  file.write(new_data)

如果每一行都在換行符上,則可以執行以下操作而無需for循環:

def repel(mo):
    if mo.group(3) == 'NULL':
        return '{}{}{}{}'.format(mo.group(1), mo.group(2), mo.group(3), mo.group(4))
    return '{}{}{}{}'.format(mo.group(1), mo.group(2), 'XXXX,', mo.group(4))

pattern = re.compile('(.*)(name: )(\w+,)?(.*)')
re.sub(pattern, repel, _in, re.DOTALL)

基本上在每次比賽時都調用排斥功能,該功能將name:后的部分替換為XXXX

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM