简体   繁体   English

需要部分正则表达式代码的具体说明

[英]Need a specific explanation of part of a regex code

I'm developing a calculator program in Python, and need to remove leading zeros from numbers so that calculations work as expected. 我正在用Python开发一个计算器程序,需要从数字中删除前导零,以便计算能够按预期进行。 For example, if the user enters "02+03" into the calculator, the result should return 5. In order to remove these leading zeroes in-front of digits, I asked a question on here and got the following answer. 例如,如果用户在计算器中输入“ 02 + 03”,则结果应返回5。为了删除数字前的这些前导零,我在这里提出了一个问题,并得到了以下答案。

self.answer = eval(re.sub(r"((?<=^)|(?<=[^\.\d]))0+(\d+)", r"\1\2", self.equation.get()))

I fully understand how the positive lookbehind to the beginning of the string and lookbehind to the non digit, non period character works. 我完全理解字符串开头和后面的非数字,非句点字符的正向查找如何工作。 What I'm confused about is where in this regex code can I find the replacement for the matched patterns? 我感到困惑的是,在此正则表达式代码中,我可以在哪里找到匹配模式的替代项?

I found this online when researching regex expressions. 我在研究正则表达式时在网上找到了这个。

result = re.sub(pattern, repl, string, count=0, flags=0)

Where is the "repl" in the regex code above? 上面的正则表达式代码中的“ repl”在哪里? If possible, could somebody please help to explain what the r"\\1\\2" is used for in this regex also? 如果可能的话,有人可以帮忙解释一下该正则表达式中的r“ \\ 1 \\ 2”吗?

Thanks for your help! 谢谢你的帮助! :) :)

The "repl" part of the regex is this component: 正则表达式的“ repl”部分是以下组件:

r"\1\2"

In the "find" part of the regex, group capturing is taking place (ordinarily indicated by "()" characters around content, although this can be overridden by specific arguments). 在正则表达式的“查找”部分中,正在进行组捕获(通常由内容周围的“()”字符表示,尽管可以用特定的参数覆盖)。

In python regex, the syntax used to indicate a reference to a positional captured group (sometimes called a "backreference") is "\\n" (where "n" is a digit refering to the position of the group in the "find" part of the regex). 在python regex中,用于指示对位置捕获组的引用(有时称为“反向引用”)的语法为“ \\ n”(其中“ n”是指组在“查找”部分中的位置的数字)正则表达式)。

So, this regex is returning a string in which the overall content is being replaced specifically by parts of the input string matched by numbered groups. 因此,此正则表达式将返回一个字符串,在该字符串中,整体内容将被输入字符串中与编号组匹配的部分专门替换。

Note: I don't believe the "\\1" part of the "repl" is actually required. 注意:我认为实际上并不需要“ repl”的“ \\ 1”部分。 I think: 我认为:

r"\2"

...would work just as well. ...同样会工作。

Further reading: https://www.regular-expressions.info/brackets.html 进一步阅读: https : //www.regular-expressions.info/brackets.html

See example: 参见示例:

>>> import re
>>> s='awd232frr2cr23'
>>> re.sub('\d',' ',s)
'awd   frr cr  '
>>> 

Explanation: 说明:

  • As it is, '\\d' is for integer so removes them and replaces with repl (in this case ' ' ). 实际上, '\\d'是整数,因此将其删除并替换为repl (在这种情况下为' ' )。

Firstly, repl includes what you are about to replace. 首先,repl包括您将要替换的内容。

To understand \\1\\2 you need to know what capture grouping is. 要了解\\ 1 \\ 2,您需要知道什么是捕获分组。
Check this video out for basics of Group capturing . 观看此视频,了解组捕获的基础知识
Here , since your regex splits every match it finds into groups which are 1,2... so on. 在这里,由于您的正则表达式会将每个匹配项拆分为1,2 ...的组,依此类推。 This is so because of the parenthesis () you have placed in the regex. 之所以这样,是因为您已在正则表达式中放置了括号()。 $1 , $2 or \\1,\\2 can be used to refer to them. $ 1,$ 2或\\ 1,\\ 2可用于引用它们。

In this case: 在这种情况下:
The regex is replacing all numbers after the leading 0 (which is caught by group 2) with itself. 正则表达式用自己替换前导0(由组2捕获)之后的所有数字。
Note: \\1 is not necessary. 注意: \\ 1不是必需的。 works fine without it. 没有它就可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM