簡體   English   中英

Web 刮表成 Pandas Dataframe

[英]Web Scraping Table into Pandas Dataframe

在使用 Pandas 時,我是初學者。 但我想在這里獲取 Nvidia 網站上的 G-Sync 游戲監視器表: https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/並將其轉換為數據Pandas 中的框架,用於 Python。

我嘗試做的第一件事是

import pandas as pd
df = pd.read_html('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

但這似乎不起作用。 我得到一個ValueError: No tables found

然后我試着做

import requests
import lxml.html as lh
page = requests.get('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

但不知何故我得到了ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: wrong header check'))

如果有人可以解釋為什么前兩種方法不起作用以及如何將表格實際放入數據框中,那將非常有幫助。 謝謝!

數據通過 json 請求動態加載。

此腳本將 json 數據加載到 dataframe 中並打印出來:

import re
import json
import pandas as pd

url = 'https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}

html_txt = requests.get(url, headers=headers).text

json_url =  'https://www.nvidia.com' + re.search(r"'url': '(.*?)'", html_txt).group(1)

data = requests.get(json_url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

def fn(x):
    out = []
    for v in x:
        if isinstance(v, dict):
            out.append(v['en'])
        else:
            out.append(v)
    return out

df = pd.json_normalize(data['data'], max_level=0).apply(fn)
print(df)

印刷:

                  type manufacturer      model  hdr     size lcd type        resolution variable refresh rate range variable overdrive variable refresh input    driver needed
0      G-SYNC ULTIMATE         Acer    CP7271K  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
1      G-SYNC ULTIMATE         Acer        X27  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
2      G-SYNC ULTIMATE         Acer        X32  Yes       32      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
3      G-SYNC ULTIMATE         Acer        X35  Yes       35       VA  3440x1440 (WQHD)                     1-200Hz                Yes           Display Port              N/A
4      G-SYNC ULTIMATE         Asus       PG65  Yes       65       VA    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
..                 ...          ...        ...  ...      ...      ...               ...                         ...                ...                    ...              ...
159  G-SYNC Compatible           LG    2020 ZX  Yes   77, 88     OLED    7680x4320 (8K)                    40-120Hz                 No                   HDMI  445.51 or newer
160  G-SYNC Compatible          MSI   MAG251RX  Yes     24.5      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.66 or newer
161  G-SYNC Compatible        Razer  Raptor 27  Yes       27      IPS   2560x1440 (QHD)                    48-144Hz                 No           Display Port  431.60 or newer
162  G-SYNC Compatible      Samsung       CRG5   No       27       VA   1920x1080 (FHD)                    48-240Hz                 No           Display Port  430.86 or newer
163  G-SYNC Compatible    ViewSonic      XG270   No       27      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.41 or newer

[164 rows x 11 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM