[英]How can I get the following python code to output worldmaps.info (it seems this question was answered but does not work for me)
我試圖從 worldometer.info 中獲取值(類似於 post Python: No tables found matching pattern '.+' )我使用的代碼如下:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}
r = requests.get(url, headers=header)
# fix HTML multiple tbody
soup = BeautifulSoup(r.text, "html.parser")
for body in soup("tbody"):
body.unwrap()
print(soup)
df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
df = df.replace(regex=[r'\+', r'\,'], value='')
df = df.fillna('0')
df = df.to_json(orient='index')
print(df)
輸出是頁面的html,然后當pandas處理它時我有錯誤:
Traceback (most recent call last):
File "./covid19_status.py", line 37, in <module>
df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
File "/usr/local/lib64/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 1101, in read_html
displayed_only=displayed_only,
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 917, in _parse
raise retained
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 898, in _parse
tables = p.parse_tables()
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 217, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 563, in _parse_tables
raise ValueError(f"No tables found matching pattern {repr(match.pattern)}")
ValueError: No tables found matching pattern '.+'
有人能告訴我如何解決這個問題嗎? 我已經嘗試使用類似文章中的正則表達式,但無法使其正常工作並且未包含在此代碼中(我對 python 非常友好)。
提前致謝!
您可以按照此問題的答案中提供的代碼進行操作。 這是完整的代碼:
import pandas as pd
import requests
import re
url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}
r = requests.get(url, headers=header).text
r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), r)
dfs = pd.read_html(r)
dfs[0].to_csv('D:\\Worldometer.csv',index = False)
CSV
文件的屏幕截圖:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.