简体   繁体   English

python regex查找带有数字的文本

[英]python regex find text with digits

I have text like: 我有这样的文字:

sometext...one=1290...sometext...two=12985...sometext...three=1233...

How can I find one=1290 and two=12985 but not three or four or five? 如何找到one=1290two=12985但找不到三个或四个或五个呢? There are can be from 4 to 5 digits after = . =后可以有4到5位数字。 I tried this: 我尝试了这个:

import re
pattern = r"(one|two)+=+(\d{4,5})+\D"
found = re.findall(pattern, sometext, flags=re.IGNORECASE)
print(found)

It gives me results like: [('one', '1290')] . 它给我的结果如下: [('one', '1290')] If i use pattern = r"((one|two)+=+(\\d{4,5})+\\D)" it gives me [('one=1290', 'one', '1290')] . 如果我使用pattern = r"((one|two)+=+(\\d{4,5})+\\D)"它会给我[('one=1290', 'one', '1290')] How can I get just one=1290 ? 我怎样才能得到one=1290

You were close. 你近了 You need to use a single capture group (or none for that matter): 您需要使用一个捕获组(或与此无关):

((?:one|two)+=+\d{4,5})+

Full code: 完整代码:

import re

string = 'sometext...one=1290...sometext...two=12985...sometext...three=1233...'

pattern = r"((?:one|two)+=+\d{4,5})+"
found = re.findall(pattern, string, flags=re.IGNORECASE)
print(found)
# ['one=1290', 'two=12985']

使内部组不捕获: ((?:one|two)+=+(?:\\d{4,5})+\\D)

The reason that you are getting results like [('one', '1290')] rather than one=1290 is because you are using capture groups . 之所以得到[('one', '1290')]而不是one=1290的原因,是因为您正在使用捕获组 Use: 采用:

r"(?:one|two)=(?:\d{4,5})(?=\D)"
  • I have removed the additional + repeaters, as they were (I think?) unnecessary. 我已删除了其他+中继器,因为它们(我认为是不必要的)。 You don't want to match things like oneonetwo===1234 , right? 您不想匹配诸如oneonetwo===1234类的东西,对吗?
  • Using (?:...) rather than (...) defines a non-capture group. 使用(?:...)而不是(...)定义一个非捕获组。 This prevents the result of the capture from being returned, and you instead get the whole match. 这样可以防止返回捕获结果,而是获得整个匹配结果。
  • Similarly, using (?=\\D) defines a look-ahead - so this is excluded from the match result. 类似地,使用(?=\\D)定义前瞻 -因此将其排除在匹配结果之外。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM