简体   繁体   English

Python re(regex)匹配包含字母,连字符,数字的特定字符串

[英]Python re (regex) matching particular string containing letters, hyphen, numbers

I am trying to match the following strings in python 2.7 using the python regular expression package re and am having trouble coming up with the regex code: 我正在尝试使用python正则表达式包re匹配python 2.7中的以下字符串,并且遇到了正则表达式代码的问题:

https://www.this.com/john-smith/e5609239
https://www.this.com/jane-johnson/e426609216
https://www.this.com/wendy-saad/e172645609215
https://www.this.com/nick-madison/e7265609214
https://www.this.com/tom-taylor/e17265709211
https://www.this.com/james-bates/e9212

So the prefix is fixed " https://www.this.com/ " and then there are a variable number of lowercase letters, then "-", then "e", then a variable number of digits. 因此,前缀是固定的“ https://www.this.com/ ”,然后是可变数量的小写字母,然后是“-”,然后是“ e”,然后是可变数量的数字。

Here is what I have tried to no avail: 这是我试图徒劳的:

href=re.compile("https://www.this.com/people-search/[a-z]+[\-](?P<firstNumBlock>\d+)/")

href=re.compile("https://www.this.com/people-search/[a-z][\-][a-z]+/e[0-9]+")

Thanks for your help! 谢谢你的帮助!

href=re.compile("https://www\.mylife\.com/people-search/[a-z]+-[a-z]+/e[0-9]+")

在这里尝试。

You are running into issues with escaping special characters. 您遇到了转义特殊字符的问题。 Since you're not using raw strings, the backslash has special meaning in your string literal itself. 由于您未使用原始字符串,因此反斜杠在字符串文字本身中具有特殊含义。 Additionally, character classes (with [] ) don't require escaping in a regular expression. 另外,字符类(带有[] )不需要在正则表达式中转义。 You can simplify your expression as follows: 您可以如下简化表达式:

expression = r"https://www.mylife.com/people-search/[a-z]+-[a-z]+/e\d+"

With the following data: 带有以下数据:

strings = ['https://www.mylife.com/people-search/john-smith/e5609239',
 'https://www.this.com/people-search/jane-johnson/e426609216',
 'https://www.this.com/people-search/wendy-saad/e172645609215',
 'https://www.this.com/people-search/nick-madison/e7265609214',
 'https://www.this.com/people-search/tom-taylor/e17265709211',
 'https://www.this.com/people-search/james-bates/e9212']

Result: 结果:

>>> for s in strings:
...     print(re.match(expression, s))
...
<_sre.SRE_Match object; span=(0, 56), match='https://www.this.com/people-search/john-smith/e>
<_sre.SRE_Match object; span=(0, 60), match='https://www.this.com/people-search/jane-johnson>
<_sre.SRE_Match object; span=(0, 61), match='https://www.this.com/people-search/wendy-saad/e>
<_sre.SRE_Match object; span=(0, 61), match='https://www.this.com/people-search/nick-madison>
<_sre.SRE_Match object; span=(0, 60), match='https://www.this.com/people-search/tom-taylor/e>
<_sre.SRE_Match object; span=(0, 54), match='https://www.this.com/people-search/james-bates/>
re.compile(r'https://www.this.com/[a-z-]+/e\d+')

[az-]+ e5609239 [az-]+匹配john-smith e\\d+匹配e5609239

text = '''https://www.this.com/john-smith/e5609239
https://www.this.com/jane-johnson/e426609216
https://www.this.com/wendy-saad/e172645609215
https://www.this.com/nick-madison/e7265609214
https://www.this.com/tom-taylor/e17265709211
https://www.this.com/james-bates/e9212'''
href = re.compile(r'https://www\.this\.com/[a-zA-Z]+\-[a-zA-Z]+/e[0-9]+')
m = href.findall(text)
pprint(m)

Outputs: 输出:

['https://www.this.com/john-smith/e5609239',
'https://www.this.com/jane-johnson/e426609216',
'https://www.this.com/wendy-saad/e172645609215',
'https://www.this.com/nick-madison/e7265609214',
'https://www.this.com/tom-taylor/e17265709211',
'https://www.this.com/james-bates/e9212']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM