I have a data set with records like the following
Tenochtitlan 1519
Tetzcoco 20
Tlacopan 21
I need a regex that will return only number that exist in pairs (ie in the above example 20 and 21) - ultimately so I can a prefix to the numbers and end up with:
Tenochtitlan 1519
Tetzcoco 1520
Tlacopan 1521
I've tried this, just having trouble with the match (matching '15' from the first record) and then getting the match as a string output:
list = ["Tenochtitlan 1519","Tetzcoco 20","Tlacopan 21"]
for x in list:
m = re.compile("(\d\D*?){2}")
match_val = m.search(x)
concat = "15" + str(match_val)
re.sub(str(match_val), x, concat)
for x in list:
print(x)
Result -
Tenochtitlan 1519
Tetzcoco 20
Tlacopan 21
First, str(match_val)
is not doing what you think it's doing. From the debugger:
(Pdb) str(match_val)
"<re.Match object; span=(13, 15), match='15'>"
Secondly, the value of x is never being changed. sub()
only returns the new string. Demonstrating in iPython:
In [1]: import re
In [2]: x = "string"
In [3]: re.sub("ing", "ingthing", x)
Out[3]: 'stringthing'
In [4]: x
Out[4]: 'string'
You'll also run into difficulty replacing the original value in a for... in
loop.
Third, you've got your arguments to sub()
in the wrong order. It goes: regex, replacement string, original string.
Fourth: Your original regex is kind of strange and probably not matching what you expect. \\s\\d\\d$
or \\s\\d{2}$
is probably closer to what you expect.
One way to do this would be to use a capture group (parenthesis) and a backreference (a backslash and a digit) to do the substitution all in one go:
import re
list = ["Tenochtitlan 1519","Tetzcoco 20","Tlacopan 21"]
new_list = []
for x in list:
new_list.append(re.sub('\s(\d\d)$', r' 15\1', x))
for x in new_list:
print(x)
Output:
Tenochtitlan 1519
Tetzcoco 1520
Tlacopan 1521
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.