簡體   English   中英

Python re(regex)匹配包含字母,連字符,數字的特定字符串

[英]Python re (regex) matching particular string containing letters, hyphen, numbers

我正在嘗試使用python正則表達式包re匹配python 2.7中的以下字符串,並且遇到了正則表達式代碼的問題:

https://www.this.com/john-smith/e5609239
https://www.this.com/jane-johnson/e426609216
https://www.this.com/wendy-saad/e172645609215
https://www.this.com/nick-madison/e7265609214
https://www.this.com/tom-taylor/e17265709211
https://www.this.com/james-bates/e9212

因此,前綴是固定的“ https://www.this.com/ ”,然后是可變數量的小寫字母,然后是“-”,然后是“ e”,然后是可變數量的數字。

這是我試圖徒勞的:

href=re.compile("https://www.this.com/people-search/[a-z]+[\-](?P<firstNumBlock>\d+)/")

href=re.compile("https://www.this.com/people-search/[a-z][\-][a-z]+/e[0-9]+")

謝謝你的幫助!

href=re.compile("https://www\.mylife\.com/people-search/[a-z]+-[a-z]+/e[0-9]+")

在這里嘗試。

您遇到了轉義特殊字符的問題。 由於您未使用原始字符串,因此反斜杠在字符串文字本身中具有特殊含義。 另外,字符類(帶有[] )不需要在正則表達式中轉義。 您可以如下簡化表達式:

expression = r"https://www.mylife.com/people-search/[a-z]+-[a-z]+/e\d+"

帶有以下數據:

strings = ['https://www.mylife.com/people-search/john-smith/e5609239',
 'https://www.this.com/people-search/jane-johnson/e426609216',
 'https://www.this.com/people-search/wendy-saad/e172645609215',
 'https://www.this.com/people-search/nick-madison/e7265609214',
 'https://www.this.com/people-search/tom-taylor/e17265709211',
 'https://www.this.com/people-search/james-bates/e9212']

結果:

>>> for s in strings:
...     print(re.match(expression, s))
...
<_sre.SRE_Match object; span=(0, 56), match='https://www.this.com/people-search/john-smith/e>
<_sre.SRE_Match object; span=(0, 60), match='https://www.this.com/people-search/jane-johnson>
<_sre.SRE_Match object; span=(0, 61), match='https://www.this.com/people-search/wendy-saad/e>
<_sre.SRE_Match object; span=(0, 61), match='https://www.this.com/people-search/nick-madison>
<_sre.SRE_Match object; span=(0, 60), match='https://www.this.com/people-search/tom-taylor/e>
<_sre.SRE_Match object; span=(0, 54), match='https://www.this.com/people-search/james-bates/>
re.compile(r'https://www.this.com/[a-z-]+/e\d+')

[az-]+ e5609239 [az-]+匹配john-smith e\\d+匹配e5609239

text = '''https://www.this.com/john-smith/e5609239
https://www.this.com/jane-johnson/e426609216
https://www.this.com/wendy-saad/e172645609215
https://www.this.com/nick-madison/e7265609214
https://www.this.com/tom-taylor/e17265709211
https://www.this.com/james-bates/e9212'''
href = re.compile(r'https://www\.this\.com/[a-zA-Z]+\-[a-zA-Z]+/e[0-9]+')
m = href.findall(text)
pprint(m)

輸出:

['https://www.this.com/john-smith/e5609239',
'https://www.this.com/jane-johnson/e426609216',
'https://www.this.com/wendy-saad/e172645609215',
'https://www.this.com/nick-madison/e7265609214',
'https://www.this.com/tom-taylor/e17265709211',
'https://www.this.com/james-bates/e9212']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM