使用BeautifulSoup从html页面提取链接

Question

我需要从Piography网站上提取一些文章。

因此，在此页面http://www.biography.com/people中，我需要所有子链接。 例如：

 /people/ryan-seacrest-21095899
 /people/edgar-allan-poe-9443160

但是我有两个问题：

1-当我尝试查找所有<a>时。 我找不到所需的href。

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.biography.com/people"
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text)
divs = soup.findAll('a')
for div in divs:
    print(div)

2-有一个“查看更多”按钮。 因此，我如何获取网站中所有人员的所有链接。 不仅出现在首页上吗？

Answer 1

在您显示的网站上，使用JS生成的角度内容和部分内容。 BeautifulSoup无法执行JS。 您需要使用http://selenium-python.readthedocs.io/或其他类似工具。 或者，您可以尝试使用ajax需要的GET（或可能是POST）方法，并通过他提供数据。

使用BeautifulSoup从html页面提取链接

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-05-03 10:49:22

使用BeautifulSoup从html页面提取链接

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-05-03 10:49:22

解决方案1
2 已采纳 2017-05-03 10:49:22