为什么这个python只捕获一个数字？

Question

I'm trying to use python RE module to capture specific digits of strings like '03' in ' video [720P] [DHR] _sp03.mp4 ' . 我正在尝试使用python RE模块捕获' video [720P] [DHR] _sp03.mp4 ' '03'等字符串的特定数字。

And what confused me is : 令我困惑的是：

when I use '.*\\D+(\\d+).*mp4' , it succeed to capture both the two digits 03 , but when I use '.*\\D*(\\d+).*mp4' , it only captured the rear digit 3 . 当我使用'.*\\D+(\\d+).*mp4' ，它成功捕获两个数字03 ，但当我使用'.*\\D*(\\d+).*mp4' ，它只捕获了后方数字3 。

I know python uses a greedy mode as default, which means trying to match as much text as possible. 我知道python使用贪婪模式作为默认模式，这意味着尝试匹配尽可能多的文本。 Considering this, I think * and + after the \\D should behave samely. 考虑到这一点，我想*和+后\\D应该相同则表现。 So where am I wrong? 那我在哪里错了？ What leads to this difference? 是什么导致了这种差异？ Can anyone help explain it? 谁能帮忙解释一下呢？

BTW: I used online regex tester for python: https://regex101.com/#python BTW：我使用python的在线正则表达式测试器： https ： //regex101.com/#python

Answer 1

What makes the difference is not the \\D+ but the first .* 是什么造成差异不是\\D+而是第一个.*

Now in regex .* is greedy and tries to match as much as characters as possible as it can 现在在正则表达式.*是贪婪的，尽可能地匹配尽可能多的字符

So when you write 所以当你写作

.*\D*(\d+).*mp4

The .* will match as much as it can. .*将尽可能多地匹配。 That is if we try to break it down, it would look like 那就是如果我们试图将其分解，那就像是

video [720P] [DHR] _sp03.mp4
|
.*

video [720P] [DHR] _sp03.mp4
 |
 .*
.....

video [720P] [DHR] _sp03.mp4
                      |
                      .* That is 0 is also matched by the .

video [720P] [DHR] _sp03.mp4
                      |
                      \D* Since the quantfier is zero or more, it matches nothing here without advancing to 3

video [720P] [DHR] _sp03.mp4
                       |
                      (\d+)

video [720P] [DHR] _sp03.mp4
                        |
                        .*

video [720P] [DHR] _sp03.mp4
                          |
                         mp4

Now when we use the \\D+ , the matching changes a bit, because the regex engine will be forced to match at least 1 non digit( \\D+ ) before the digits ( (\\d+) ). 现在，当我们使用\\D+ ，匹配会稍微改变，因为正则表达式引擎将被强制匹配至少1个非数字（ \\D+ ）之前的数字（ (\\d+) ）。 This will be consume the p which is the last non digit before the digits 这将消耗p ，这是数字之前的最后一位非数字

That is 那是

.* will try to match as much as it can till p , so that the \\D+ can match at least one non digit which is p and \\d+ will match you the 03 part .*会尝试尽可能多地匹配到p ，这样\\D+可以匹配至少一个非数字，即p和\\d+将匹配你的03部分

video [720P] [DHR] _sp03.mp4
|
.*

video [720P] [DHR] _sp03.mp4
 |
 .*
.....

video [720P] [DHR] _sp03.mp4
                     |
                     \D+ The first non digit. Forced to match at least once.

video [720P] [DHR] _sp03.mp4
                      |
                      (\d+) 

video [720P] [DHR] _sp03.mp4
                       |
                      (\d+)

video [720P] [DHR] _sp03.mp4
                        |
                        .*

video [720P] [DHR] _sp03.mp4
                          |
                         mp4

Answer 2

The problem is with \\D*. 问题是\\ D *。 The '+' is for one or more and '*' is for zero or more. '+'表示一个或多个，'*'表示零或更多。

As you have used '.*' in starting it become greedy and takes till ' video [720P] [DHR] _sp0' where in '\\D+' case it quits at ' video [720P] [DHR] _s' leaving 'p' for \\D+ 正如您在开始时使用'。*'变得贪婪并直到'视频[720P] [DHR] _sp0'在'\\ D +'的情况下它退出'视频[720P] [DHR] _s'离开'p'为\\ D +

>>> import re
>>> a = " video [720P] [DHR] _sp03.mp4 "
>>> p1 = re.compile('.*\D+(\d+).*mp4')
>>> p2 = re.compile('.*\D*(\d+).*mp4')
>>> re.findall(p1,a)
['03']
>>> re.findall(p2,a)
['3']
>>> a
' video [720P] [DHR] _sp03.mp4 '
>>> p3 = re.compile('(.*)(\D*)(\d+)(.*)mp4')
>>> re.findall(p3,a)
[(' video [720P] [DHR] _sp0', '', '3', '.')]
>>> p4 = re.compile('(.*)(\D+)(\d+)(.*)mp4')
>>> re.findall(p4,a)
[(' video [720P] [DHR] _s', 'p', '03', '.')]

为什么这个python只捕获一个数字？

问题描述

2 个解决方案

解决方案1
7 已采纳 2015-03-22 04:35:58

解决方案2
1 2015-03-22 04:57:26

为什么这个python只捕获一个数字？

问题描述

2 个解决方案

解决方案1 7 已采纳 2015-03-22 04:35:58

解决方案2 1 2015-03-22 04:57:26

解决方案1
7 已采纳 2015-03-22 04:35:58

解决方案2
1 2015-03-22 04:57:26