简体   繁体   English

如何提取显示在 chrome 的开发人员工具上的 href 属性,而不是 BeautifulSoup 的 output

[英]How to extract href attributes that show up on chrome's developer tools, but not on BeautifulSoup's output

I'm trying to scrape a website to compile and summarize news, using Python's request and bs4.我正在尝试使用 Python 的 request 和 bs4 抓取一个网站来编译和总结新闻。 The links (href) that I'm trying to access appear on Chrome's developer tools with this path:我尝试访问的链接(href)出现在 Chrome 的开发人员工具中,路径如下:

"/html/body/div/div/div/main/article/div/div/section/div/section/div/div[3]/ul/li[1]/a" “/html/body/div/div/div/main/article/div/div/section/div/section/div/div[3]/ul/li[1]/a”

I tried everything to extract them, but I realized that Python's html output doesn't go down to that level.我尝试了一切来提取它们,但我意识到 Python 的 html output 没有 go 到那个级别。 It stays at:它停留在:

"/html/body/div/div/div/main/article/div/div/section/div/section" “/html/body/div/div/div/main/article/div/div/section/div/section”

I'm using the following code:我正在使用以下代码:

import requests
from bs4 import BeautifulSoup
url = 'https://www.gp.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for url in soup.find_all('a'):
    print(url.get('href'))

I'd really appreciate any help you can give me because I'm completely out of ideas.我真的很感激你能给我的任何帮助,因为我完全没有想法。 Also, I'm completely new to programing, so would appreciate your answers to be dumbed down as much as possible.另外,我对编程完全陌生,所以希望你的答案尽可能地低调。

Thanks a lot in advance!提前非常感谢!

The requests module doesn't render javascript -- you have to use requests-html ( https://github.com/psf/requests-html ). requests模块不会呈现 javascript - 您必须使用requests-html ( https://github.com/psf/requests-html )。 You can see the difference if you open the page in a browser and then look at the source (typically CTRL-U).如果您在浏览器中打开页面然后查看源代码(通常是 CTRL-U),您可以看到差异。 It will be different than what you can view using Developer Tools because the latter includes content that is rendered by javascript.它将不同于您可以使用开发工具查看的内容,因为后者包含由 javascript 呈现的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM