简体   繁体   中英

Beautiful Soup can't find tables

I'm trying to gather some data from a table on a web page with Python and Beautiful Soup. When I make a selection from the page, however, I'm getting different results than I get in the browser. Specifically, the tables are missing completely. Here's a screenshot of the table in the inspector of Firefox dev tools:

网页和检查器的截图

And here's the output that I get from Beautiful Soup:

带输出的 IDE 屏幕截图

I've tried using urllib instead of requests, and I've tried using different HTML parsers, (html.parser and lxml). All give the same results. Any advice on what might be happening here and how I might get around it to access the data from the table?

import requests
from bs4 import BeautifulSoup
import pandas
import tabula
import html5lib

knox = requests.get("https://covid.knoxcountytn.gov/case-count.html")
knox_soup = BeautifulSoup(knox.text, 'html5lib')
knox_confirmed = knox_soup.find('div', id='covid_cases').prettify()

print(knox_confirmed)

Try to disable javascript when you visit https://covid.knoxcountytn.gov/case-count.html and you will see no table. As @barny said the table is generated with javascript so you can't parse it with BeautifulSoup (at least not easily, see How to call JavaScript function using BeautifulSoup and Python ).

Website is loaded via JavaScript , so you can't use requestes to render the JS for you. You can use selenium or requests_html etc.

As for now, I've been able to track from where the data is fetched. by checking the XHR traffic been made.

So we can use pandas.read_csv() as the following:

import pandas as pd

df = pd.read_csv("https://covid.knoxcountytn.gov/includes/covid_cases.csv")

print(df)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM