BeautifulSoup 允许我抓取一些文章而不是其他文章（来自同一份报纸）

Question

我正在尝试使用 Beautiful Soup 抓取新闻文章。 但是，它仅适用于网站上的某些文章，而不适用于其他文章。 我在源代码中找不到任何明显的差异，因此我将非常感谢有关如何解决此问题的任何想法。

例如，这很好：

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.dn.se/nyheter/sverige/ewa-stenberg-darfor-ligger-sverige-steget-efter/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

content = soup.find('div', class_='article__body')
body = content.text
print(body)

但是将网址更改为：

result = requests.get("https://www.dn.se/nyheter/sverige/Regeringen-vill-att-skolor-ska-fa-satta-betyg-i-arskurs-4")

产生以下错误：

AttributeError: 'NoneType' 对象没有属性 'text'

Answer 1

在此示例中，抓取本身没有任何问题，第二个 URL 以 301（移动永久重定向）响应，这意味着您将在响应中获得一个新 URL。 在请求中，您需要执行一些操作才能遵循重定向。

有关如何解决它或阅读http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and 的信息，请参阅此答案https://stackoverflow.com/a/50606372/10201813 -更多信息的历史

BeautifulSoup 允许我抓取一些文章而不是其他文章（来自同一份报纸）

问题描述

1 个解决方案

解决方案1
1 2020-03-23 12:29:53

BeautifulSoup 允许我抓取一些文章而不是其他文章（来自同一份报纸）

问题描述

1 个解决方案

解决方案1 1 2020-03-23 12:29:53

解决方案1
1 2020-03-23 12:29:53