如何将URL与python正则表达式匹配？

Question

我的问题是，我希望匹配HTML代码中的URL，如下所示： href='example.com'或使用" ，但我只想提取实际的URL。我尝试匹配它，然后使用数组魔术只获得数组，但由于正则表达式匹配是贪婪的 ，如果有超过1个有理匹配，则会有更多从一个'开始并以另一个URL结束' 。什么正则表达式适合我的需要？

Answer 1

I would recommend NOT using regex to parse HTML. 我建议不要使用正则表达式来解析HTML。 Your life will be much easier if you use something like beautifulsoup ! 如果你使用像beautifulsoup这样的东西，你的生活会更容易！

It's as easy as this: 它就像这样简单：

from BeautifulSoup import BeautifulSoup

HTML = """<a href="https://firstwebsite.com">firstone</a><a href="https://secondwebsite.com">Ihaveurls</a>"""

s = BeautifulSoup(HTML)

for href in s.find_all('a', href=True): print("My URL: ", href['href'])

Answer 2

In case if you want it to solve it using regular expression instead of using other libraries of python. 如果您希望它使用正则表达式而不是使用其他python库来解决它。 Here is the solution. 这是解决方案。

import re
html = '<a href="https://www.abcde.com"></a>'
pattern = r'href=\"(.*)\"|href=\'(.*)\''
multiple_match_links = re.findall(pattern,html)
if(len(multiple_match_links) == 0):
     print("No Link Found")
else:
     print([x for x in list(multiple_match_links[0]) if len(x) > 0][0])

如何将URL与python正则表达式匹配？

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-10-02 17:33:37

解决方案2
0 2018-10-04 12:18:06

如何将URL与python正则表达式匹配？

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-10-02 17:33:37

解决方案2 0 2018-10-04 12:18:06

解决方案1
3 已采纳 2018-10-02 17:33:37

解决方案2
0 2018-10-04 12:18:06