使用 python、BeautifulSoup 和 pandas 'read_html' 进行 web 抓取的问题

Question

Thank you for the helpers !谢谢各位帮手！

I am scraping a table of data about covid19 and push it into a pandas data frame, it was working until this morning.我正在抓取有关 covid19 的数据表并将其推送到 pandas 数据框中，它一直工作到今天早上。

That the code:即代码：

import pandas as pd
import requests
from bs4 import BeautifulSoup


url = 'https://www.worldometers.info/coronavirus/'

req = requests.get(url)

page = BeautifulSoup(req.content, 'html.parser')

table = page.find_all('table',id="main_table_countries_today")[0]

print(table)

df = pd.read_html(str(table))[0]

This morning I starting to get the next error:今天早上我开始遇到下一个错误：

ValueError: No tables found matching pattern '.+'

Can you please help me figure it out?你能帮我弄清楚吗？

Answer 1

Try changing the last line to: df = pd.read_html(str(table), displayed_only=False)[0] The table header at the url has changed its style attribute to style="width:100%;margin-top: 0px;important:display;none.".尝试将最后一行更改为： df = pd.read_html(str(table), displayed_only=False)[0]表 header at the url has changed its style attribute to style="width:100%;margin-top: 0px ；重要：显示；无。”。 Previously it did not have the 'display' tag set.以前它没有设置“显示”标签。

使用 python、BeautifulSoup 和 pandas 'read_html' 进行 web 抓取的问题

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-05-29 14:47:13

使用 python、BeautifulSoup 和 pandas 'read_html' 进行 web 抓取的问题

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-05-29 14:47:13

解决方案1
3 已采纳 2020-05-29 14:47:13