[英]Regex to find the link with all strings and a character anywhere in between
I've a text extract like this (the original text is very long with hyperlinks staggered across): 我有一个这样的文本摘录(原始文本很长,超链接交错在一起):
... <div> <a href=\n=3D"https://www.wonder.com/alerts/remove" style=3D"text-decoration:none;color:#427fed">Unsu=\nbscribe</a> <span style=3D"padding:0px 4px 0px 4px;color:#252525">|<a href=3D"https:=\n//www.wonder.com/url?rct=3Dj&sa=3Dt&></span> =\n<a href=3D"https://www.wonder.com/alerts?source=3Dalertsmail&hl=3Den&=\namp;gl=3DIN&msgid=3DMTA2MzYwOTAxMTQ5NzI4MTc3MTE" style=3D"text-decorati=\non:none;color:#427fed"> View all your alerts </a> </div> </td> </tr> <tr> <=\ntd style=3D"padding:6px 10px 0px 0px;font-family:Arial"> <a href=3D"https:/=\n/www.wonder.com/alerts/feeds/065638/93686812" styl=\ne=3D"text-decoration:none;color:#427fed">...
I'm trying to extract this hyperlink (not necessarily the last one): https:/=\\n/www.wonder.com/alerts/feeds/065638/93686812
我正在尝试提取此超链接(不一定是最后一个): https:/=\\n/www.wonder.com/alerts/feeds/065638/93686812
I do not know where the '=' will appear in the link. 我不知道“ =”在链接中出现的位置。 So tried to use a positive lookahead like this: 因此,尝试使用如下正向前瞻:
re.match(r'(?=\\=)\\"https(.*).*\\"', text)
This did not help. 这没有帮助。 Suggestions pls. 建议。
Also, is there a way we can define a list where the strings are present and then match a string that has all these strings? 另外,有没有一种方法可以定义存在字符串的列表,然后匹配具有所有这些字符串的字符串? I saw couple of posts on matching any in the list, not all. 我看到几个与列表中的任何一个匹配的帖子,而不是全部。 I tried to look for a pattern like (https)&(wonder)&(alerts)&(feeds)
but not much luck. 我试图寻找一种模式,如(https)&(wonder)&(alerts)&(feeds)
但是运气不高。
这为我工作:
(\"https([^\"])*\d+\")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.