I'm trying to gather some data from a table on a web page with Python and Beautiful Soup. When I make a selection from the page, however, I'm getting different results than I get in the browser. Specifically, the tables are missing completely. Here's a screenshot of the table in the inspector of Firefox dev tools:
And here's the output that I get from Beautiful Soup:
I've tried using urllib instead of requests, and I've tried using different HTML parsers, (html.parser and lxml). All give the same results. Any advice on what might be happening here and how I might get around it to access the data from the table?
import requests
from bs4 import BeautifulSoup
import pandas
import tabula
import html5lib
knox = requests.get("https://covid.knoxcountytn.gov/case-count.html")
knox_soup = BeautifulSoup(knox.text, 'html5lib')
knox_confirmed = knox_soup.find('div', id='covid_cases').prettify()
print(knox_confirmed)
Try to disable javascript when you visit https://covid.knoxcountytn.gov/case-count.html and you will see no table. As @barny said the table is generated with javascript so you can't parse it with BeautifulSoup (at least not easily, see How to call JavaScript function using BeautifulSoup and Python ).
Website is loaded via JavaScript
, so you can't use requestes
to render the JS
for you. You can use selenium
or requests_html
etc.
As for now, I've been able to track from where the data is fetched. by checking the XHR
traffic been made.
So we can use pandas.read_csv()
as the following:
import pandas as pd
df = pd.read_csv("https://covid.knoxcountytn.gov/includes/covid_cases.csv")
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.