简体   繁体   English

正则表达式:获取对数字的反向引用,添加到它

[英]regex: getting backreference to number, adding to it

Simple regex question:简单的正则表达式问题:

I want to replace page numbers in a string with pagenumber + some number (say, 10).我想用 pagenumber + 一些数字(比如 10)替换字符串中的页码。 I figured I could capture a matched page number with a backreference, do an operation on it and use it as the replacement argument in re.sub .我想我可以使用反向引用捕获匹配的页码,对其进行操作并将其用作re.sub的替换参数。

This works (just passing the value):这有效(只是传递值):

def add_pages(x):
    return x

re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE)

Yielding, of course, 'here is Page 11 and here is Page 78\\nthen there is Page 65'屈服,当然, 'here is Page 11 and here is Page 78\\nthen there is Page 65'

Now, if I change the add_pages function to modify the passed backreference, I get an error.现在,如果我更改 add_pages 函数以修改传递的反向引用,则会出现错误。

def add_pages(x):
        return int(x)+10


re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE)

ValueError: invalid literal for int() with base 10: '\\1'

, as what is passed to the add_pages function seems to be the literal backreference, not what it references. ,因为传递给 add_pages 函数的似乎是文字反向引用,而不是它引用的内容。

Absent extracting all matched numbers to a list and then processing and adding back, how would I do this?如果没有将所有匹配的数字提取到列表中,然后处理并添加回来,我该怎么做?

The actual problem is, you are supposed to pass a function to the second parameter of re.sub , instead you are calling a function and passing the return value.实际问题是,您应该将函数传递给re.sub的第二个参数,而不是调用函数并传递返回值。

Why does it work in the first case?为什么它在第一种情况下有效?

Whenever a match is found, the second parameter will be looked at.每当找到匹配项时,就会查看第二个参数。 If it is a string, then it will be used as the replacement, if it is a function, then the function will be called with the match object.如果是字符串,则将其用作替换,如果是函数,则将使用匹配对象调用该函数 In your case, add_pages(r"\\1") , is simply returning r"\\1" itself.在您的情况下, add_pages(r"\\1")只是返回r"\\1"本身。 So, the re.sub translates to this所以, re.sub翻译成这个

print re.sub("(?<=Page )(\d{2})", r"\1", ...)

So, it actually replaces the original matched string with the same.因此,它实际上用相同的字符串替换了原始匹配的字符串。 That is why it works.这就是它起作用的原因。

Why it doesn't work in the second case?为什么它在第二种情况下不起作用?

But, in the second case, when you do但是,在第二种情况下,当你做

add_pages(r"\1")

you are trying to convert r"\\1" to an integer, which is not possible.您正在尝试将r"\\1"转换为整数,这是不可能的。 That is why it is failing.这就是它失败的原因。

How to fix this?如何解决这个问题?

The actual way to write this would be,写这个的实际方法是,

def add_pages(matchObject):
    return str(int(matchObject.group()) + 10)
print re.sub("(?<=Page )(\d{2})", add_pages, ...)

Read more about the group function, here 在此处阅读有关group功能的更多信息

def add_pages(matchobj):
    return str(int(matchobj.group(0))+10)


print re.sub("(?<=Page )(\d{2})",add_pages ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM