简体   繁体   English

Beautiful Soup 找不到桌子

[英]Beautiful Soup can't find tables

I'm trying to gather some data from a table on a web page with Python and Beautiful Soup.我正在尝试从 web 页面上的表格中收集一些数据,其中包含 Python 和 Beautiful Soup。 When I make a selection from the page, however, I'm getting different results than I get in the browser.但是,当我从页面中进行选择时,得到的结果与在浏览器中得到的结果不同。 Specifically, the tables are missing completely.具体来说,这些表完全丢失了。 Here's a screenshot of the table in the inspector of Firefox dev tools:这是 Firefox 开发工具检查器中表格的屏幕截图:

网页和检查器的截图

And here's the output that I get from Beautiful Soup:这是我从 Beautiful Soup 获得的 output:

带输出的 IDE 屏幕截图

I've tried using urllib instead of requests, and I've tried using different HTML parsers, (html.parser and lxml).我试过使用 urllib 而不是请求,我试过使用不同的 HTML 解析器(html.parser 和 lxml)。 All give the same results.所有给出相同的结果。 Any advice on what might be happening here and how I might get around it to access the data from the table?关于这里可能发生的事情以及我如何绕过它以访问表中的数据的任何建议?

import requests
from bs4 import BeautifulSoup
import pandas
import tabula
import html5lib

knox = requests.get("https://covid.knoxcountytn.gov/case-count.html")
knox_soup = BeautifulSoup(knox.text, 'html5lib')
knox_confirmed = knox_soup.find('div', id='covid_cases').prettify()

print(knox_confirmed)

Try to disable javascript when you visit https://covid.knoxcountytn.gov/case-count.html and you will see no table.当您访问https://covid.knoxcountytn.gov/case-count.html时尝试禁用 javascript,您将看不到任何表格。 As @barny said the table is generated with javascript so you can't parse it with BeautifulSoup (at least not easily, see How to call JavaScript function using BeautifulSoup and Python ).正如@barny 所说,该表是使用 javascript 生成的,因此您无法使用 BeautifulSoup 对其进行解析(至少不容易,请参阅如何使用 BeautifulSoup 和 Python 调用 JavaScript function )。

Website is loaded via JavaScript , so you can't use requestes to render the JS for you.网站通过JavaScript加载,因此您不能使用requestes为您呈现JS You can use selenium or requests_html etc.您可以使用seleniumrequests_html等。

As for now, I've been able to track from where the data is fetched.至于现在,我已经能够跟踪从何处获取数据。 by checking the XHR traffic been made.通过检查XHR流量。

So we can use pandas.read_csv() as the following:所以我们可以使用pandas.read_csv()如下:

import pandas as pd

df = pd.read_csv("https://covid.knoxcountytn.gov/includes/covid_cases.csv")

print(df)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM