简体   繁体   English

使用 python、BeautifulSoup 和 pandas 'read_html' 进行 web 抓取的问题

[英]A problem with web scraping using python ,BeautifulSoup and pandas 'read_html'

Thank you for the helpers !谢谢各位帮手!

I am scraping a table of data about covid19 and push it into a pandas data frame, it was working until this morning.我正在抓取有关 covid19 的数据表并将其推送到 pandas 数据框中,它一直工作到今天早上。

That the code:即代码:

import pandas as pd
import requests
from bs4 import BeautifulSoup


url = 'https://www.worldometers.info/coronavirus/'

req = requests.get(url)

page = BeautifulSoup(req.content, 'html.parser')

table = page.find_all('table',id="main_table_countries_today")[0]

print(table)

df = pd.read_html(str(table))[0]

This morning I starting to get the next error:今天早上我开始遇到下一个错误:

ValueError: No tables found matching pattern '.+'

Can you please help me figure it out?你能帮我弄清楚吗?

Try changing the last line to: df = pd.read_html(str(table), displayed_only=False)[0] The table header at the url has changed its style attribute to style="width:100%;margin-top: 0px;important:display;none.".尝试将最后一行更改为: df = pd.read_html(str(table), displayed_only=False)[0]表 header at the url has changed its style attribute to style="width:100%;margin-top: 0px ;重要:显示;无。”。 Previously it did not have the 'display' tag set.以前它没有设置“显示”标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM