简体   繁体   English

查找以“ing”结尾的单词的两种正则表达式模式之间的区别

[英]Difference between two regex patterns to find words ending with 'ing'

I am trying to find words ending with 'ing' in the following sentence = "Playing outdoor games when its raining outside is always fun!"我试图在以下句子中找到以“ing”结尾的单词=“外面下雨时玩户外游戏总是很有趣!”

Now this is not my question itself as I found the necessary regex pattern to do it- (r'\b([Az]+ing)\b') .现在这不是我的问题本身,因为我找到了必要的正则表达式模式来做到这一点(r'\b([Az]+ing)\b')

The thing is I'm unable to understand why the above works but not what I tried below:问题是我无法理解为什么上述工作但不是我在下面尝试的:

re.findall('([Az]+ing)$',"Playing outdoor games when it's raining outside is always fun!")

Returns empty list even though the below doesn't即使下面没有返回空列表

re.findall('([Az]+ing)$','amazing')

Returns amazing回报惊人

So this pattern can match single words ending with 'ing' but not words in sentences?所以这个模式可以匹配以“ing”结尾的单个单词,但不能匹配句子中的单词? Why?为什么?

What I found even more weird is this: re.findall('\b([Az]+ing)\b',"Playing outdoor games when it's raining outside is always fun!") returns no matches (empty list).我发现更奇怪的是: re.findall('\b([Az]+ing)\b',"Playing outdoor games when it's raining outside is always fun!")不返回任何匹配项(空列表)。 The only difference is not using the raw string notation (r)唯一的区别是不使用原始字符串表示法 (r)

I thought the 'r' notation was only necessary when we want to escape backslashes.我认为'r'符号只有在我们想要逃避反斜杠时才需要。 So in that case: Pattern1 - '\b([Az]+ing)\b' should match playing, raining etc. instead of Pattern2- r'\b([Az]+ing)\b' What exactly have I understood wrongly?所以在那种情况下: Pattern1 - '\b([Az]+ing)\b'应该匹配播放,下雨等而不是 Pattern2- r'\b([Az]+ing)\b'我到底理解了什么错? I searched a lot of Stack Overflow answers and the official Python regex documentation and now I am more confused than when I started out particularly regarding the use of 'r' .我搜索了很多 Stack Overflow 答案和官方 Python 正则表达式文档,现在我比刚开始使用'r'时更加困惑。

The $ matches end of line or end of whole text (depending on flag setting, here: only end of text). $匹配行尾或整个文本的结尾(取决于标志设置,此处:仅文本结尾)。 Using it right after the "ing" forces that the "ing" must appear at the end.在“ing”之后立即使用它会强制“ing”必须出现在末尾。

Raw string notation lets the escaped characters like \b go through to the underlying function (here: findall ) to be processed further (here: as a special regex code for word boundary).原始字符串表示法允许像\b go 这样的转义字符到底层 function (这里: findall )被进一步处理(这里:作为单词边界的特殊正则表达式代码)。

Without raw string notation, \b is the BACKSPACE control code (hex 0x08 ).如果没有原始字符串表示法, \b是 BACKSPACE 控制代码(十六进制0x08 )。 This character is processed by the regex engine as a simple match of itself.该字符由正则表达式引擎处理为自身的简单匹配。

Using [Az] to match all letters is also not right.使用[Az]匹配所有字母也不正确。 It actually means to match any character in the Unicode table between A and z .它实际上意味着匹配 Unicode 表中Az之间的任何字符。 As you can see here this includes eg [ , ^ and \ .正如您在此处看到的,这包括例如[^\ If you only want the ASCII letters, use [A-Za-z] instead.如果您只想要 ASCII 字母,请改用[A-Za-z] If you want all Unicode word characters (letters and digits in any supported language and underscore) use \w .如果您想要所有 Unicode 单词字符(任何支持的语言和下划线的字母和数字),请使用\w

To play around with regular expressions there is eg https://regex101.com/要使用正则表达式,例如https://regex101.com/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM