[英]python regex find text with digits
I have text like: 我有这样的文字:
sometext...one=1290...sometext...two=12985...sometext...three=1233...
How can I find one=1290
and two=12985
but not three or four or five? 如何找到
one=1290
和two=12985
但找不到三个或四个或五个呢? There are can be from 4 to 5 digits after =
. =
后可以有4到5位数字。 I tried this: 我尝试了这个:
import re
pattern = r"(one|two)+=+(\d{4,5})+\D"
found = re.findall(pattern, sometext, flags=re.IGNORECASE)
print(found)
It gives me results like: [('one', '1290')]
. 它给我的结果如下:
[('one', '1290')]
。 If i use pattern = r"((one|two)+=+(\\d{4,5})+\\D)"
it gives me [('one=1290', 'one', '1290')]
. 如果我使用
pattern = r"((one|two)+=+(\\d{4,5})+\\D)"
它会给我[('one=1290', 'one', '1290')]
。 How can I get just one=1290
? 我怎样才能得到
one=1290
?
You were close. 你近了 You need to use a single capture group (or none for that matter):
您需要使用一个捕获组(或与此无关):
((?:one|two)+=+\d{4,5})+
Full code: 完整代码:
import re
string = 'sometext...one=1290...sometext...two=12985...sometext...three=1233...'
pattern = r"((?:one|two)+=+\d{4,5})+"
found = re.findall(pattern, string, flags=re.IGNORECASE)
print(found)
# ['one=1290', 'two=12985']
使内部组不捕获: ((?:one|two)+=+(?:\\d{4,5})+\\D)
The reason that you are getting results like [('one', '1290')]
rather than one=1290
is because you are using capture groups . 之所以得到
[('one', '1290')]
而不是one=1290
的原因,是因为您正在使用捕获组 。 Use: 采用:
r"(?:one|two)=(?:\d{4,5})(?=\D)"
+
repeaters, as they were (I think?) unnecessary. +
中继器,因为它们(我认为是不必要的)。 You don't want to match things like oneonetwo===1234
, right? oneonetwo===1234
类的东西,对吗? (?:...)
rather than (...)
defines a non-capture group. (?:...)
而不是(...)
定义一个非捕获组。 This prevents the result of the capture from being returned, and you instead get the whole match. (?=\\D)
defines a look-ahead - so this is excluded from the match result. (?=\\D)
定义前瞻 -因此将其排除在匹配结果之外。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.