简体   繁体   English

正则表达式查找所有字符串和介于两者之间的任意字符的链接

[英]Regex to find the link with all strings and a character anywhere in between

I've a text extract like this (the original text is very long with hyperlinks staggered across): 我有一个这样的文本摘录(原始文本很长,超链接交错在一起):

... <div> <a href=\n=3D"https://www.wonder.com/alerts/remove" style=3D"text-decoration:none;color:#427fed">Unsu=\nbscribe</a> <span style=3D"padding:0px 4px 0px 4px;color:#252525">|<a href=3D"https:=\n//www.wonder.com/url?rct=3Dj&amp;sa=3Dt&amp;></span> =\n<a href=3D"https://www.wonder.com/alerts?source=3Dalertsmail&amp;hl=3Den&=\namp;gl=3DIN&amp;msgid=3DMTA2MzYwOTAxMTQ5NzI4MTc3MTE" style=3D"text-decorati=\non:none;color:#427fed"> View all your alerts </a> </div> </td> </tr> <tr> <=\ntd style=3D"padding:6px 10px 0px 0px;font-family:Arial"> <a href=3D"https:/=\n/www.wonder.com/alerts/feeds/065638/93686812" styl=\ne=3D"text-decoration:none;color:#427fed">...

I'm trying to extract this hyperlink (not necessarily the last one): https:/=\\n/www.wonder.com/alerts/feeds/065638/93686812 我正在尝试提取此超链接(不一定是最后一个): https:/=\\n/www.wonder.com/alerts/feeds/065638/93686812

I do not know where the '=' will appear in the link. 我不知道“ =”在链接中出现的位置。 So tried to use a positive lookahead like this: 因此,尝试使用如下正向前瞻:

re.match(r'(?=\\=)\\"https(.*).*\\"', text)

This did not help. 这没有帮助。 Suggestions pls. 建议。

Also, is there a way we can define a list where the strings are present and then match a string that has all these strings? 另外,有没有一种方法可以定义存在字符串的列表,然后匹配具有所有这些字符串的字符串? I saw couple of posts on matching any in the list, not all. 我看到几个与列表中的任何一个匹配的帖子,而不是全部。 I tried to look for a pattern like (https)&(wonder)&(alerts)&(feeds) but not much luck. 我试图寻找一种模式,如(https)&(wonder)&(alerts)&(feeds)但是运气不高。

Hi use this regex. 嗨,使用此正则表达式。

    https:(.*?)(>|")

You can test your regex online in this link I test your text with my regex and got 4 matchs. 您可以在此链接中在线测试您的正则表达式。我使用我的正则表达式测试您的文字,并有4个匹配项。

这为我工作:

(\"https([^\"])*\d+\")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM