简体   繁体   English

带有替换功能的 Python 替换

[英]Python substitution with replacement function

I have a file with some SQL like:我有一个包含一些 SQL 的文件,例如:

INSERT INTO table (ID, Name) VALUES (1, 'a');
INSERT INTO table (ID, Name) VALUES (2, 'b');
...
INSERT INTO table (ID, Name) VALUES (1000, 'all');

And I want to increment in the file all the ID values with 1000, to get:我想在文件中用 1000 增加所有ID值,以获得:

INSERT INTO table (ID, Name) VALUES (1001, 'a');
INSERT INTO table (ID, Name) VALUES (1002, 'b');
...
INSERT INTO table (ID, Name) VALUES (2000, 'all');

I wrote the following Python code我写了以下 Python 代码

import os, re
root = r'path\to\dir'
path = os.path.join(root, 'original.sql')
new =  os.path.join(root, 'new.sql')

def increment(n, base=1000):
    return str(int(n.group(1)) + base)

with open(path) as f, open(new, 'w') as g:
    for line in f:
        line = re.sub('.*VALUES \((\d{1,4}),.*', increment, line)
        g.write(line)

but that just outputs the incremented values instead of doing the substitution.但这只是输出增加的值而不是进行替换。 What am I doing wrong?我究竟做错了什么?

Change your regex to:将您的正则表达式更改为:

def fix_line(n, base=1000):
    return n.group(1) + str(int(n.group(2)) + base) + n.group(3)

line = re.sub('(.*VALUES \()(\d{1,4})(,.*)', fix_line, line)

So if you have line = "INSERT INTO table (ID, Name) VALUES (1001, 'a');"所以如果你有line = "INSERT INTO table (ID, Name) VALUES (1001, 'a');" to start, then after your regex substitution you will have:开始,然后在您的正则表达式替换之后,您将拥有:

line = "INSERT INTO table (ID, Name) VALUES (2001, 'a');"

Basically, you need to capture the stuff before the number and the stuff after the number and include it in your processing of each line.基本上,您需要捕获数字之前的内容和数字之后的内容,并将其包含在您对每一行的处理中。

I should add that you don't need .* at the start and the end of your regex.我应该补充一点,您不需要.*在正则表达式的开头和结尾。 It will also work with line = re.sub('(VALUES \\()(\\d{1,4})(,)', fix_line, line) though this time you only match a small part of line , specifically VALUES (1001, and then apply your substitution function on just that and leaving the other parts of the string unchanged. (Your original regex matched the entire line and regenerated it.).它也适用于line = re.sub('(VALUES \\()(\\d{1,4})(,)', fix_line, line)虽然这次你只匹配line一小部分,特别是VALUES (1001,然后仅在其上应用替换函数并保持字符串的其他部分不变。(您的原始正则表达式匹配整行并重新生成它。)。

You could also do你也可以这样做

def iterate_number(n, base=1000):
    return "VALUES (%d," % (int(n.group(1)) + base)

line = re.sub('VALUES \((\d{1,4}),', iterate_number, line)

which only has one matched group (the number), and merely adds back the VALUES ( before the number, and the comma after the number in the string processing.它只有一个匹配的组(数字),并且只是在字符串处理中添加了VALUES (数字之前,和数字之后的逗号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM