抓取网站时缺少 HTML 元素。 Python

Question

I am trying to extract an HREF from a website using bs4 and Selenium.我正在尝试使用 bs4 和 Selenium 从网站中提取 HREF。 However, when I use Beautiful Soup to parse the HTML, the elements I'm looking for go missing.但是，当我使用 Beautiful Soup 解析 HTML 时，我正在寻找的元素丢失了。 When I try searching for them later, I just get NoneType Objects.当我稍后尝试搜索它们时，我只会得到 NoneType 对象。 Here is what I'd like to take out:这是我想取出的：

I am using the following code to parse quickly:我正在使用以下代码来快速解析：

my_url = browser.current_url
uClient = uReq(my_url) 
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

But when I run:但是当我运行时：

squeeps = page_soup.findAll("div",{'id':'pcisBody'})
squeeps[0]

This is all I get:这就是我得到的全部：

<div id="pcisBody">
<img alt="loading" height="40" src="/OnlineServices/Images/loading.gif" width="40"/>
<span id="pcisLoading">Retrieving Data...</span>
</div>

Any help would be greatly appreciated!!任何帮助将不胜感激！！ Here is the link: https://www.ladbsservices2.lacity.org/OnlineServices/PermitReport/PermitResults/444952这是链接： https : //www.ladbsservices2.lacity.org/OnlineServices/PermitReport/PermitResults/444952

Answer 1

BeautifulSoup doesn't capture the data of a website after it's initial load. BeautifulSoup 不会在初始加载后捕获网站的数据。 As a workaround, you can use selenium and visit the website.作为解决方法，您可以使用 selenium 并访问该网站。 Then, wait till certain minutes or a certain load event is triggered and then get the page source.然后，等到特定分钟或某个加载事件被触发，然后获取页面源。 Then, pass it to BeautifulSoup.然后，将其传递给 BeautifulSoup。

抓取网站时缺少 HTML 元素。 Python

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-16 05:04:44

抓取网站时缺少 HTML 元素。 Python

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-16 05:04:44

解决方案1
0 已采纳 2020-11-16 05:04:44