如何从HTML中提取链接（使用python）

Question

so I've downloaded the HTML of a web page. 因此，我已经下载了网页的HTML。 I'm supposed to extract all of the links from the HTML and output them. 我应该从HTML中提取所有链接并输出它们。 Here is my code 这是我的代码

f = open('html.py','r')
heb = f.readlines()
arry = []
if 'href' in heb:
    arry = arry.append(href)

    print(arry)

I'm trying to make a list of the links and output it, but honestly I'm pretty lost. 我正在尝试列出链接并输出，但是说实话我很迷路。 Can someone point me in the right direction? 有人可以指出我正确的方向吗？ I was thinking regex probably is the way to go thanks 我在想正则表达式可能是要走的路谢谢

Answer 1

You can use Beautiful Soup (which you'll need to install, eg with pip install BeautifulSoup4 ): 您可以使用Beautiful Soup（需要安装，例如pip install BeautifulSoup4 ）：

import bs4

with open("my-file.html") as f:
    soup = bs4.BeautifulSoup(f)

links = [link['href'] for link in soup('a') if 'href' in link.attrs]

如何从HTML中提取链接（使用python）

问题描述

1 个解决方案

解决方案1
2 2017-06-20 02:02:18

如何从HTML中提取链接（使用python）

问题描述

1 个解决方案

解决方案1 2 2017-06-20 02:02:18

解决方案1
2 2017-06-20 02:02:18