[英]Web Scraping Table into Pandas Dataframe
在使用 Pandas 时,我是初学者。 但我想在这里获取 Nvidia 网站上的 G-Sync 游戏监视器表: https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/并将其转换为数据Pandas 中的框架,用于 Python。
我尝试做的第一件事是
import pandas as pd
df = pd.read_html('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')
但这似乎不起作用。 我得到一个ValueError: No tables found 。
然后我试着做
import requests
import lxml.html as lh
page = requests.get('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')
但不知何故我得到了ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: wrong header check')) 。
如果有人可以解释为什么前两种方法不起作用以及如何将表格实际放入数据框中,那将非常有帮助。 谢谢!
数据通过 json 请求动态加载。
此脚本将 json 数据加载到 dataframe 中并打印出来:
import re
import json
import pandas as pd
url = 'https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
html_txt = requests.get(url, headers=headers).text
json_url = 'https://www.nvidia.com' + re.search(r"'url': '(.*?)'", html_txt).group(1)
data = requests.get(json_url, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
def fn(x):
out = []
for v in x:
if isinstance(v, dict):
out.append(v['en'])
else:
out.append(v)
return out
df = pd.json_normalize(data['data'], max_level=0).apply(fn)
print(df)
印刷:
type manufacturer model hdr size lcd type resolution variable refresh rate range variable overdrive variable refresh input driver needed
0 G-SYNC ULTIMATE Acer CP7271K Yes 27 IPS 3840x2160 (4K) 1-144Hz Yes Display Port N/A
1 G-SYNC ULTIMATE Acer X27 Yes 27 IPS 3840x2160 (4K) 1-144Hz Yes Display Port N/A
2 G-SYNC ULTIMATE Acer X32 Yes 32 IPS 3840x2160 (4K) 1-144Hz Yes Display Port N/A
3 G-SYNC ULTIMATE Acer X35 Yes 35 VA 3440x1440 (WQHD) 1-200Hz Yes Display Port N/A
4 G-SYNC ULTIMATE Asus PG65 Yes 65 VA 3840x2160 (4K) 1-144Hz Yes Display Port N/A
.. ... ... ... ... ... ... ... ... ... ... ...
159 G-SYNC Compatible LG 2020 ZX Yes 77, 88 OLED 7680x4320 (8K) 40-120Hz No HDMI 445.51 or newer
160 G-SYNC Compatible MSI MAG251RX Yes 24.5 IPS 1920x1080 (FHD) 48-240Hz No Display Port 441.66 or newer
161 G-SYNC Compatible Razer Raptor 27 Yes 27 IPS 2560x1440 (QHD) 48-144Hz No Display Port 431.60 or newer
162 G-SYNC Compatible Samsung CRG5 No 27 VA 1920x1080 (FHD) 48-240Hz No Display Port 430.86 or newer
163 G-SYNC Compatible ViewSonic XG270 No 27 IPS 1920x1080 (FHD) 48-240Hz No Display Port 441.41 or newer
[164 rows x 11 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.