简体   繁体   中英

Python substitution with replacement function

I have a file with some SQL like:

INSERT INTO table (ID, Name) VALUES (1, 'a');
INSERT INTO table (ID, Name) VALUES (2, 'b');
...
INSERT INTO table (ID, Name) VALUES (1000, 'all');

And I want to increment in the file all the ID values with 1000, to get:

INSERT INTO table (ID, Name) VALUES (1001, 'a');
INSERT INTO table (ID, Name) VALUES (1002, 'b');
...
INSERT INTO table (ID, Name) VALUES (2000, 'all');

I wrote the following Python code

import os, re
root = r'path\to\dir'
path = os.path.join(root, 'original.sql')
new =  os.path.join(root, 'new.sql')

def increment(n, base=1000):
    return str(int(n.group(1)) + base)

with open(path) as f, open(new, 'w') as g:
    for line in f:
        line = re.sub('.*VALUES \((\d{1,4}),.*', increment, line)
        g.write(line)

but that just outputs the incremented values instead of doing the substitution. What am I doing wrong?

Change your regex to:

def fix_line(n, base=1000):
    return n.group(1) + str(int(n.group(2)) + base) + n.group(3)

line = re.sub('(.*VALUES \()(\d{1,4})(,.*)', fix_line, line)

So if you have line = "INSERT INTO table (ID, Name) VALUES (1001, 'a');" to start, then after your regex substitution you will have:

line = "INSERT INTO table (ID, Name) VALUES (2001, 'a');"

Basically, you need to capture the stuff before the number and the stuff after the number and include it in your processing of each line.

I should add that you don't need .* at the start and the end of your regex. It will also work with line = re.sub('(VALUES \\()(\\d{1,4})(,)', fix_line, line) though this time you only match a small part of line , specifically VALUES (1001, and then apply your substitution function on just that and leaving the other parts of the string unchanged. (Your original regex matched the entire line and regenerated it.).

You could also do

def iterate_number(n, base=1000):
    return "VALUES (%d," % (int(n.group(1)) + base)

line = re.sub('VALUES \((\d{1,4}),', iterate_number, line)

which only has one matched group (the number), and merely adds back the VALUES ( before the number, and the comma after the number in the string processing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM