簡體   English   中英

如何從抓取的 JSON 字符串(Python)中提取信息?

[英]How to extract information from a scraped JSON string (Python)?

我正在嘗試從 FotMob(一個足球網站)中抓取一些數據,但是當使用請求和漂亮的湯訪問 HTML 時,它會返回一大串文本,看起來像是 json 的形式。 摘錄如下所示:

{"id":9902,"teamId":9902,"nameAndSubstatValue":{"name":"Ipswich Town","substatValue":10},"statValue":"5.2","rank":13,"type":"teams","statFormat":"fraction","substatFormat":"number"},{"id":8283,"teamId":8283,"nameAndSubstatValue":{"name":"Barnsley","substatValue":5},"statValue":"5.2","rank":14,"type":"teams","statFormat":"fraction","substatFormat":"number"}

我用來獲取此內容的代碼如下所示:

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r=requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
for p in soup.find_all('script',attrs={'id':'__NEXT_DATA__'}):
print(p.text)

具體來說,我想訪問 stat_value、name 和 substatValue 並將它們放入 pandas 數據幀中。 有誰知道如何做到這一點?

使用json.loads解析數據:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc, "html.parser")

data = json.loads(soup.find("script", attrs={"id": "__NEXT_DATA__"}).text)

d = data["props"]["pageProps"]["initialState"]["leagueSeasonStats"]["statsData"]
df = pd.DataFrame(d)
df = pd.concat([df, df.pop("nameAndSubstatValue").apply(pd.Series)], axis=1)

print(df)

印刷:

       id  teamId statValue  rank   type statFormat substatFormat                 name  substatValue
0    8462    8462       8.9     1  teams   fraction        number           Portsmouth            12
1    8451    8451       7.3     2  teams   fraction        number    Charlton Athletic             9
2    9792    9792       7.3     3  teams   fraction        number        Burton Albion             4
3    8671    8671       6.3     4  teams   fraction        number   Accrington Stanley             8
4    9833    9833       6.2     5  teams   fraction        number          Exeter City             9
5   10170   10170       6.1     6  teams   fraction        number         Derby County             3
6    8677    8677       5.9     7  teams   fraction        number  Peterborough United            12
7    8401    8401       5.8     8  teams   fraction        number      Plymouth Argyle             8
8    8559    8559       5.7     9  teams   fraction        number     Bolton Wanderers             5
9    8676    8676       5.3    10  teams   fraction        number    Wycombe Wanderers             8
10  10163   10163       5.3    11  teams   fraction        number  Sheffield Wednesday             7
11   8680    8680       5.3    12  teams   fraction        number      Cheltenham Town             3
12   9902    9902       5.2    13  teams   fraction        number         Ipswich Town            10
13   8283    8283       5.2    14  teams   fraction        number             Barnsley             5
14   8653    8653       5.0    15  teams   fraction        number        Oxford United             3
15   9799    9799       4.3    16  teams   fraction        number            Port Vale             5
16  45723   45723       4.3    17  teams   fraction        number       Fleetwood Town             4
17   9828    9828       4.0    18  teams   fraction        number  Forest Green Rovers             4
18   9896    9896       3.7    19  teams   fraction        number      Shrewsbury Town             2
19   9834    9834       3.5    20  teams   fraction        number     Cambridge United             5
20  10104   10104       3.2    21  teams   fraction        number       Bristol Rovers             7
21   8430    8430       2.9    22  teams   fraction        number         Lincoln City             4
22   8489    8489       2.6    23  teams   fraction        number            Morecambe             2
23   8645    8645       2.2    24  teams   fraction        number   Milton Keynes Dons             3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM