[英]Advanced multiple search replace
Problem: I want to batch replace patterns in a file in an advanced way, so I cannot use any standard search and replace tools: 问题:我想以高级方式批量替换文件中的模式,所以我不能使用任何标准的搜索和替换工具:
Let's assume there is file 1: 假设有文件1:
B
B
A
B
B
B
A
B
B
A
B
And I want to replace B by something else. 我想用其他方式代替B。 But only each B, which comes after a A.
但是只有每个B,在A之后。
Here is File 2, which holds the "rules", how to search and replace: 这是文件2,其中包含“规则”以及如何搜索和替换:
A;B;C1
A;B;C2
A;B;C3
The ";" “;” should be the divider.
应该是分隔线。 Can be anything else.
可以是其他任何东西。 Script should search for and A. Then continue to search for B. And replace that B by C1.
脚本应搜索和A。然后继续搜索B。并将该B替换为C1。 Afterwards continue to the next occurence of A. Search for the next B and replace this B by C2.
之后,继续执行下一个出现的A。搜索下一个B,并将其替换为C2。 And so on.
等等。 When the script has replaced B by C3, it should stop, because there is no further rule.
当脚本用C3替换B后,它应该停止,因为没有其他规则了。
Final file should look like: 最终文件应如下所示:
B
B
A
C1
B
B
A
C2
B
A
C3
I want to use python for it, but it is not mandatory, if there is an easier way. 我想为其使用python,但如果有更简单的方法,则它不是强制性的。
You could implement something similar using regular expressions. 您可以使用正则表达式实现类似的功能。
re.finditer
returns starting/ending position of match and re.sub
accepts parameter how many substitutions should be made. re.finditer
返回match的开始/结束位置,并且re.sub
接受参数应进行多少次替换。 You can start from this: 您可以从这里开始:
import re
data = '''B
B
A
B
B
B
A
B
B
A
B'''
rules = [
(r'A.*?(B)', r'C1'),
(r'A.*?(B)', r'C2'),
(r'A.*?(B)', r'C3'),
]
startpos = 0
while rules:
rule = rules.pop(0)
for g in re.finditer(rule[0], data[startpos:], flags=re.DOTALL):
data = data[:startpos + g.start(1)] + re.sub(g.group(1), rule[1], data[startpos + g.start(1):], count=1)
startpos += g.start(1)
break
print(data)
Prints: 印刷品:
B
B
A
C1
B
B
A
C2
B
A
C3
I started writing a regex based solution, but @Andrej got there first! 我开始写一个基于正则表达式的解决方案,但是@Andrej首先到达那里! So I present you a more "naive" approach that does not use regex.
因此,我向您展示了一种不使用正则表达式的更“幼稚”的方法。
#!/usr/bin/env python3
import sys
def read_rules(fpath="/tmp/test.rules", sep=";"):
rules = []
with open(fpath) as f:
for line in f:
rules.append(line.strip().split(sep))
return rules
def parse_data(rules, fpath="/tmp/test.data"):
cur_rule = rules[0]
rule_idx = 0
data = []
state = None
with open(fpath) as f:
for line in f:
line = line.strip('\n')
if not cur_rule:
data.append(line)
continue
# We match start
if cur_rule[0] in line and not state:
# End matches in the same line and start < end
# This case is not in your data
if (
cur_rule[1] in line
and line.index(cur_rule[0]) < line.index(cur_rule[1])
):
new_line = line.replace(cur_rule[1], cur_rule[2], 1)
data.append(new_line)
rule_idx += 1
# We reached the end of rules
if len(rules) == rule_idx:
cur_rule = None
else:
cur_rule = rules[rule_idx]
else:
# Set state to looking for end
state = 1
data.append(line)
continue
# Now, if here we are looking for end...
if state == 1:
# Nope... not found... move on
if cur_rule[1] not in line:
data.append(line)
continue
# replace
data.append(
line.replace(cur_rule[1], cur_rule[2], 1)
)
# Reset state
state = None
rule_idx += 1
# We reached the end of rules
if len(rules) == rule_idx:
cur_rule = None
else:
cur_rule = rules[rule_idx]
continue
# Here, no line matched
data.append(line)
return data
def main():
rules = read_rules()
print(rules)
data = parse_data(rules)
print("\n".join(data))
if __name__ == "__main__":
sys.exit(main())
Explanation: 说明:
Pros: 优点:
Cons: 缺点:
Output (note I added one extra match to check that it stops when rules finish): 输出(请注意,我添加了一个额外的匹配项以检查规则完成后是否停止):
B
B
A
C1
B
B
A
C2
B
A
C3
A
B
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.