删除图案之间的所有线条

Question

I would like to extract from the text: 我想从文本中摘录：

CHEXA*          99001088        99001001        99001143        99001179*00072A1
*00072A1        99001047        99001104        99001144        99001180*00072A2
*00072A2        99001048        99001105                                
RBE3*           99001089                        99001001             123*00072A5
*00072A50.11263443595303             123         6001515.041507658257159*00072A6
*00072A6         60016620.61808377914687             123         6001542
CHEXA*          99001086        99001001        99001128        99001095*0007299
*0007299        99001081        99001171                                *000729B
*000729B

this portion: 这部分：

RBE3*           99001089                        99001001             123*00072A5
*00072A50.11263443595303             123         6001515.041507658257159*00072A6
*00072A6         60016620.61808377914687             123         6001542

put it in a file and delete it from the initial file, which will look this way afterwards: 将其放在文件中，然后从初始文件中删除它，此后看起来将像这样：

CHEXA*          99001088        99001001        99001143        99001179*00072A1
*00072A1        99001047        99001104        99001144        99001180*00072A2
*00072A2        99001048        99001105                                
CHEXA*          99001086        99001001        99001128        99001095*0007299
*0007299        99001081        99001171                                *000729B
*000729B

What I tried was: 我试过的是：

sed '/RBE3\*/,/\*/d'

but unfortunately it will stop after the first occurrence of . 但不幸的是，它将在第一次出现时停止。 But the purpose is to delete all lines after RBE3 is met, which starts with * and this one will delete only one line. 但目的是在满足RBE3后删除所有行 ，该行以*开头，而该行将仅删除一行。 Thank you 谢谢

Answer 1

import os

keep = True
with open(pathToInput) as infile, open(pathToOutput, 'w') as outfile, open(pathToSave) as savefile:
    for line in infile:
        if line.startswith("RBE3"):
            keep = False
        elif not line.startswith("*"):
            keep = True
        if keep:
            outfile.write(line)
        else:
            savefile.write(line)

os.remove(pathToInput)
os.rename(pathToOutput, pathToInput)

Answer 2

RBE3\*[^\n]*\n(?:\*[^\n]*\n)*

Try this.Replace with empty string .See demo. 试试这个。用empty string替换。请参阅演示。

https://regex101.com/r/vN3sH3/3 https://regex101.com/r/vN3sH3/3

print re.sub(r"RBE3\*[^\n]*\n(?:\*[^\n]*\n)*","",text)

Answer 3

Through python's re module. 通过python的re模块。

import re
with open('/path/to/the/infile') as infile, open('/path/to/the/outfile', 'w+') as out:
    foo = infile.read()
    out.write(re.sub(r'(?s)RBE3\*.*?\n(?!\*)', r'', foo))

Update: 更新：

import re
with open('/path/to/the/infile') as infile, open('/path/to/the/outfile', 'w+') as out, open('/path/to/the/file/to/save/deleted/lines', 'w+') as save:
    foo = infile.read()
    out.write(re.sub(r'(?s)(.*?\n)(RBE3\*.*?\n(?!\*))(.*)', r'\1\3', foo))
    save.write(re.sub(r'(?s)(.*?\n)(RBE3\*.*?\n(?!\*))(.*)', r'\2', foo))

Answer 4

here's a regexp that will work on Python or PCRE 这是一个适用于Python或PCRE的正则表达式

/(RBE3\\*).+(?=CHEXA\\*)/s (note that s modifier is required for it to work.) /(RBE3\\*).+(?=CHEXA\\*)/s （ /(RBE3\\*).+(?=CHEXA\\*)/s （？= /(RBE3\\*).+(?=CHEXA\\*)/s / s （请注意，使用s修饰符才能起作用。）

A simple python implementation : 一个简单的python实现：

import re
import os
inPut = "list"
outPut = "tmp"

regexp = re.compile("(RBE3\*).+(?=CHEXA\*)", re.S)

with open(inPut, 'r') as f:
    fileStr = f.read()
match = regexp.search(fileStr).group(0)
ret = re.sub(regexp, "", fileStr)
with open(outPut, 'w') as tmpFile:
    tmpFile.write(match)
os.remove(inPut)
os.rename(outPut, inPut)

Answer 5

With awk: 使用awk：

awk -v flag=0 '
    /^[^\*]/  { flag = 0 } # clear flag if the line does not start with a *
    /^RBE3\*/ { flag = 1 } # except if it is the starting line of an ignored block
    flag == 0 { print }    # print if ignore flag is not set.
  ' foo.txt

The nice thing about this is that it is easily extended for the inversion. 这样做的好处是，它很容易扩展以用于反转。 If you write 如果你写

awk -v flag=0 -v ignore=0 '
    /^[^\*]/ { flag = 0 }
    /^RBE3\*/ { flag = 1 }
    flag != ignore { print }
  ' foo.txt

then by replacing ignore=0 with ignore=1 , you can extract the block instead of ignoring it. 然后通过用ignore=0 ignore=1替换ignore=0 ，您可以提取块而不是忽略它。

Answer 6

using awk: 使用awk：

awk '{if(match($0,"RBE3")>0)flag=0}{if(match($0,"CHEXA")>0)flag=1}{if(flag==1) print $0}' File

output: 输出：

CHEXA*          99001088        99001001        99001143        99001179*00072A1
*00072A1        99001047        99001104        99001144        99001180*00072A2
*00072A2        99001048        99001105                                
CHEXA*          99001086        99001001        99001128        99001095*0007299
*0007299        99001081        99001171                                *000729B
*000729B

Answer 7

awk -v key="RBE3" '
index($0,key"*")==1 { f=1; print > "newfile" }
f && /^\*/ { print > "newfile"; next }
{ f=0; print }
' file > tmp && mv tmp file

The above uses index() so it's doing a string rather than regexp comparison so it won't fail if your key contains RE metacharacters, unlike any sed solution. 上面使用index（），所以它是在进行字符串而不是进行正则表达式比较，因此，与其他sed解决方案不同，如果您的密钥包含RE元字符，它也不会失败。

删除图案之间的所有线条

问题描述

7 个解决方案

解决方案1
1 已采纳 2014-12-18 13:29:28

解决方案2
1 2014-12-18 13:38:10

解决方案3
1 2014-12-18 13:38:29

解决方案4
0 2014-12-18 13:33:11

解决方案5
0 2014-12-18 13:35:04

解决方案6
0 2014-12-18 13:48:19

解决方案7
0 2014-12-18 14:36:47

删除图案之间的所有线条

问题描述

7 个解决方案

解决方案1 1 已采纳 2014-12-18 13:29:28

解决方案2 1 2014-12-18 13:38:10

解决方案3 1 2014-12-18 13:38:29

解决方案4 0 2014-12-18 13:33:11

解决方案5 0 2014-12-18 13:35:04

解决方案6 0 2014-12-18 13:48:19

解决方案7 0 2014-12-18 14:36:47

解决方案1
1 已采纳 2014-12-18 13:29:28

解决方案2
1 2014-12-18 13:38:10

解决方案3
1 2014-12-18 13:38:29

解决方案4
0 2014-12-18 13:33:11

解决方案5
0 2014-12-18 13:35:04

解决方案6
0 2014-12-18 13:48:19

解决方案7
0 2014-12-18 14:36:47