[英]Remove all lines between the pattern
我想從文本中摘錄:
CHEXA* 99001088 99001001 99001143 99001179*00072A1
*00072A1 99001047 99001104 99001144 99001180*00072A2
*00072A2 99001048 99001105
RBE3* 99001089 99001001 123*00072A5
*00072A50.11263443595303 123 6001515.041507658257159*00072A6
*00072A6 60016620.61808377914687 123 6001542
CHEXA* 99001086 99001001 99001128 99001095*0007299
*0007299 99001081 99001171 *000729B
*000729B
這部分:
RBE3* 99001089 99001001 123*00072A5
*00072A50.11263443595303 123 6001515.041507658257159*00072A6
*00072A6 60016620.61808377914687 123 6001542
將其放在文件中,然后從初始文件中刪除它,此后看起來將像這樣:
CHEXA* 99001088 99001001 99001143 99001179*00072A1
*00072A1 99001047 99001104 99001144 99001180*00072A2
*00072A2 99001048 99001105
CHEXA* 99001086 99001001 99001128 99001095*0007299
*0007299 99001081 99001171 *000729B
*000729B
我試過的是:
sed '/RBE3\*/,/\*/d'
但不幸的是,它將在第一次出現時停止。 但目的是在滿足RBE3后刪除所有行 ,該行以*開頭,而該行將僅刪除一行。 謝謝
import os
keep = True
with open(pathToInput) as infile, open(pathToOutput, 'w') as outfile, open(pathToSave) as savefile:
for line in infile:
if line.startswith("RBE3"):
keep = False
elif not line.startswith("*"):
keep = True
if keep:
outfile.write(line)
else:
savefile.write(line)
os.remove(pathToInput)
os.rename(pathToOutput, pathToInput)
RBE3\*[^\n]*\n(?:\*[^\n]*\n)*
試試這個。用empty string
替換。請參閱演示。
https://regex101.com/r/vN3sH3/3
print re.sub(r"RBE3\*[^\n]*\n(?:\*[^\n]*\n)*","",text)
通過python的re
模塊。
import re
with open('/path/to/the/infile') as infile, open('/path/to/the/outfile', 'w+') as out:
foo = infile.read()
out.write(re.sub(r'(?s)RBE3\*.*?\n(?!\*)', r'', foo))
更新:
import re
with open('/path/to/the/infile') as infile, open('/path/to/the/outfile', 'w+') as out, open('/path/to/the/file/to/save/deleted/lines', 'w+') as save:
foo = infile.read()
out.write(re.sub(r'(?s)(.*?\n)(RBE3\*.*?\n(?!\*))(.*)', r'\1\3', foo))
save.write(re.sub(r'(?s)(.*?\n)(RBE3\*.*?\n(?!\*))(.*)', r'\2', foo))
這是一個適用於Python或PCRE的正則表達式
/(RBE3\\*).+(?=CHEXA\\*)/s
( /(RBE3\\*).+(?=CHEXA\\*)/s
(?= /(RBE3\\*).+(?=CHEXA\\*)/s
/ s
(請注意,使用s
修飾符才能起作用。)
一個簡單的python實現:
import re
import os
inPut = "list"
outPut = "tmp"
regexp = re.compile("(RBE3\*).+(?=CHEXA\*)", re.S)
with open(inPut, 'r') as f:
fileStr = f.read()
match = regexp.search(fileStr).group(0)
ret = re.sub(regexp, "", fileStr)
with open(outPut, 'w') as tmpFile:
tmpFile.write(match)
os.remove(inPut)
os.rename(outPut, inPut)
使用awk:
awk -v flag=0 '
/^[^\*]/ { flag = 0 } # clear flag if the line does not start with a *
/^RBE3\*/ { flag = 1 } # except if it is the starting line of an ignored block
flag == 0 { print } # print if ignore flag is not set.
' foo.txt
這樣做的好處是,它很容易擴展以用於反轉。 如果你寫
awk -v flag=0 -v ignore=0 '
/^[^\*]/ { flag = 0 }
/^RBE3\*/ { flag = 1 }
flag != ignore { print }
' foo.txt
然后通過用ignore=0
ignore=1
替換ignore=0
,您可以提取塊而不是忽略它。
使用awk:
awk '{if(match($0,"RBE3")>0)flag=0}{if(match($0,"CHEXA")>0)flag=1}{if(flag==1) print $0}' File
輸出:
CHEXA* 99001088 99001001 99001143 99001179*00072A1
*00072A1 99001047 99001104 99001144 99001180*00072A2
*00072A2 99001048 99001105
CHEXA* 99001086 99001001 99001128 99001095*0007299
*0007299 99001081 99001171 *000729B
*000729B
awk -v key="RBE3" '
index($0,key"*")==1 { f=1; print > "newfile" }
f && /^\*/ { print > "newfile"; next }
{ f=0; print }
' file > tmp && mv tmp file
上面使用index(),所以它是在進行字符串而不是進行正則表達式比較,因此,與其他sed解決方案不同,如果您的密鑰包含RE元字符,它也不會失敗。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.