简体   繁体   English

正则表达式,用于捕获和替换模式中的数字

[英]RegEx for capturing and replacing digits in a pattern

I would like to replace the 3rd argument in the string with a new number (let's say 100). 我想将字符串中的第三个参数替换为一个新数字(假设为100)。 The matched string always starts with function , with the first argument either true or false , and with the number as second argument. 匹配的字符串始终以function开头,第一个参数为truefalse ,数字为第二个参数。

                                               Expected
                    |                             |
                    v                             v
'function(true, 0, 15)'   --> 'function(true, 0, 100)'  
'function(false, 0, 23)'  --> 'function(false, 0, 100)'

I have been reading the related posts but I believe I must have misunderstood some regex concept. 我一直在阅读相关文章,但我相信我一定误解了一些正则表达式概念。 The following code is that I had tried but it always replaces the whole string: 下面的代码是我尝试过的,但是它总是替换整个字符串:

import re
string = 'function(true, 0, 15)'
regex = re.compile('function\([a-zA-Z]*, [0-9]*, ([0-9]*)\)')
res = re.sub(regex, '100', string)

print(res) # 100
           # Expected: function(true, 0, 100)

Question: Could you point me out why the above code doesn't work? 问题:您能否指出以上代码为什么不起作用? How would I write the code to achieve the expected result? 我将如何编写代码以达到预期效果?

As the number you are trying to replace is just followed by a closing parenthesis ) , you can just use this \\d+(?=\\s*\\)) regex and replace it by 100 or whatever value you want. 由于您要替换的数字后面紧跟一个右括号( ) ,因此您可以使用此\\d+(?=\\s*\\))正则表达式并将其替换为100或所需的任何值。 Try these Python codes, 试试这些Python代码,

import re
string = 'function(true, 0, 15)'
regex = re.compile(r'\d+(?=\s*\))')
res = re.sub(regex, '100', string)

print(res)

Prints, 印刷品

function(true, 0, 100)

Also, the reason why your code isn't working as expected and is replacing whole of your string with 100 because the way you've written your regex, it matches your whole input and re.sub function replaces what all matches with second argument and hence all your input gets replaced with 100 . 同样,代码无法按预期运行并且将整个字符串替换为100原因是因为您编写正则表达式的方式与您的整个输入和re.sub函数相匹配,因此用第二个参数替换了所有匹配项,并且因此,您所有的输入都将替换为100 But instead what you want is, to just replace the third argument with 100 hence the way you should write your regex, should only match the third argument value, like demonstrated in below regex demo, 但是相反,您想要的只是将第三个参数替换为100因此您编写正则表达式的方式应该只匹配第三个参数值,如下面的正则表达式演示中所示,

Regex Demo matching only what you want to replace 正则表达式演示仅匹配您要替换的内容

And your current regex matches whole of your input as shown in below demo, 并且您当前的正则表达式与您的整个输入匹配,如下面的演示所示,

Regex Demo with your regex matching whole input 正则表达式演示,您的正则表达式匹配整个输入

Also, in case you feel better and you want to match whole input and then selectively replace only third argument, you can use this regex to capture the function name and first two parameters in group1 like you wanted to capture in your original regex, 另外,如果您感觉更好并且希望匹配整个输入,然后有选择地仅替换第三个参数,则可以使用此正则表达式来捕获函数名称和group1中的前两个参数,就像您想要在原始正则表达式中捕获一样,

(function\([a-zA-Z]*, [0-9]*, )[0-9]*\)

and replace it with \\g<1>100) where \\g<1> references the value captured in group1 and further it is replaced with 100) 并将其替换为\\g<1>100) ,其中\\g<1>引用在group1中捕获的值,然后将其替换为100)

Regex Demo with full match and selected replacement 正则表达式演示,具有完整匹配和选定的替换项

This expression also might work: 此表达式也可能起作用:

(?:\d+)(\))

which has a non-capturing group with our desired digits (?:\\d+) , followed by a right boundary (\\)) , which we can replace it with our new number and $1 . 其中有一个非捕获组,上面有我们想要的数字(?:\\d+) ,后跟一个右边界(\\)) ,我们可以用新数字和$1代替它。

Test 测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?:\d+)(\))"

test_str = "function(true, 0, 15)"

subst = "100\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Demo 演示版

An alternative, you can print everything before the match and everything after the match, then print it out with the new result like so: 另一种选择是,您可以在比赛之前打印所有内容,在比赛之后打印所有内容,然后使用新结果打印出来,如下所示:

regex = re.compile( '(function\([a-zA-Z]*, [0-9]*, )([0-9]*)(\))' )
res = re.sub( regex, r'\1 100\3', string )

Basically, I placed parenthesis around the text before the expected match and after the expected match. 基本上,我在预期匹配之前和之后的文本周围加上了括号。 Then I print it out as \\1 (first match) 100 (new text) \\3 (third match). 然后我将其打印为\\1 (第一次匹配) 100 (新文本) \\3 (第三次匹配)。

The reason why I propose this particular expression is in case OP specifically needs to only match strings that also contain the preceding "function(" section (or some other pattern). Plus, this is just an extension of OP's solution, so it may be more intuitive to OP. 我之所以提出这个特殊表达式的原因是,如果OP特别需要仅匹配还包含前面的“ function(”部分(或其他模式)的字符串),这只是OP解决方案的扩展,所以它可能是对OP更直观。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM