繁体   English   中英

Web 刮表成 Pandas Dataframe

[英]Web Scraping Table into Pandas Dataframe

在使用 Pandas 时,我是初学者。 但我想在这里获取 Nvidia 网站上的 G-Sync 游戏监视器表: https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/并将其转换为数据Pandas 中的框架,用于 Python。

我尝试做的第一件事是

import pandas as pd
df = pd.read_html('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

但这似乎不起作用。 我得到一个ValueError: No tables found

然后我试着做

import requests
import lxml.html as lh
page = requests.get('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

但不知何故我得到了ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: wrong header check'))

如果有人可以解释为什么前两种方法不起作用以及如何将表格实际放入数据框中,那将非常有帮助。 谢谢!

数据通过 json 请求动态加载。

此脚本将 json 数据加载到 dataframe 中并打印出来:

import re
import json
import pandas as pd

url = 'https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}

html_txt = requests.get(url, headers=headers).text

json_url =  'https://www.nvidia.com' + re.search(r"'url': '(.*?)'", html_txt).group(1)

data = requests.get(json_url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

def fn(x):
    out = []
    for v in x:
        if isinstance(v, dict):
            out.append(v['en'])
        else:
            out.append(v)
    return out

df = pd.json_normalize(data['data'], max_level=0).apply(fn)
print(df)

印刷:

                  type manufacturer      model  hdr     size lcd type        resolution variable refresh rate range variable overdrive variable refresh input    driver needed
0      G-SYNC ULTIMATE         Acer    CP7271K  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
1      G-SYNC ULTIMATE         Acer        X27  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
2      G-SYNC ULTIMATE         Acer        X32  Yes       32      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
3      G-SYNC ULTIMATE         Acer        X35  Yes       35       VA  3440x1440 (WQHD)                     1-200Hz                Yes           Display Port              N/A
4      G-SYNC ULTIMATE         Asus       PG65  Yes       65       VA    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
..                 ...          ...        ...  ...      ...      ...               ...                         ...                ...                    ...              ...
159  G-SYNC Compatible           LG    2020 ZX  Yes   77, 88     OLED    7680x4320 (8K)                    40-120Hz                 No                   HDMI  445.51 or newer
160  G-SYNC Compatible          MSI   MAG251RX  Yes     24.5      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.66 or newer
161  G-SYNC Compatible        Razer  Raptor 27  Yes       27      IPS   2560x1440 (QHD)                    48-144Hz                 No           Display Port  431.60 or newer
162  G-SYNC Compatible      Samsung       CRG5   No       27       VA   1920x1080 (FHD)                    48-240Hz                 No           Display Port  430.86 or newer
163  G-SYNC Compatible    ViewSonic      XG270   No       27      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.41 or newer

[164 rows x 11 columns]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM