[英]Regular Expression search in python if condition
I am trying to search whole word pid in the link but somewhat this is also searching for id in this code我正在尝试在链接中搜索整个单词 pid 但在某种程度上这也在此代码中搜索 id
for a in self.soup.find_all(href=True):
if 'pid' in a['href']:
href = a['href']
if not href or len(href) <= 1:
continue
elif 'javascript:' in href.lower():
continue
else:
href = href.strip()
if href[0] == '/':
href = (domain_link + href).strip()
elif href[:4] == 'http':
href = href.strip()
elif href[0] != '/' and href[:4] != 'http':
href = ( domain_link + '/' + href ).strip()
if '#' in href:
indx = href.index('#')
href = href[:indx].strip()
if href in links:
continue
links.append(self.re_encode(href))
If you mean that you want it to match a string like /pid/0002
but not /rapid.html
, then you need to exclude word characters on either side.如果您的意思是希望它匹配
/pid/0002
类的字符串而不是/rapid.html
,那么您需要排除任一侧的单词字符。 Something like:就像是:
>>> re.search(r'\Wpid\W', '/pid/0002')
<_sre.SRE_Match object; span=(0, 5), match='/pid/'>
>>> re.search(r'\Wpid\W', '/rapid/123')
None
If 'pid' might be at the start or end of the string, you'll need to add extra conditions: check for either the start/end of line or a non-word character:如果 'pid' 可能在字符串的开头或结尾,则需要添加额外的条件:检查行的开头/结尾或非单词字符:
>>> re.search(r'(^|\W)pid($|\W)', 'pid/123')
<_sre.SRE_Match object; span=(0, 4), match='pid/'>
See the docs for more information on the special characters.有关特殊字符的更多信息,请参阅文档。
You could use it like this:你可以这样使用它:
pattern = re.compile(r'(^|\W)pid($|\W)')
if pattern.search(a['href']) is not None:
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.