简体   繁体   English

如何从抓取的 JSON 字符串(Python)中提取信息?

[英]How to extract information from a scraped JSON string (Python)?

I am trying to scrape some data from FotMob (a football website), but when accessing the HTML with requests and beautiful soup it returns a huge string of text which looks like it is in the form of a json.我正在尝试从 FotMob(一个足球网站)中抓取一些数据,但是当使用请求和漂亮的汤访问 HTML 时,它会返回一大串文本,看起来像是 json 的形式。 An extract is shown below:摘录如下所示:

{"id":9902,"teamId":9902,"nameAndSubstatValue":{"name":"Ipswich Town","substatValue":10},"statValue":"5.2","rank":13,"type":"teams","statFormat":"fraction","substatFormat":"number"},{"id":8283,"teamId":8283,"nameAndSubstatValue":{"name":"Barnsley","substatValue":5},"statValue":"5.2","rank":14,"type":"teams","statFormat":"fraction","substatFormat":"number"}

The code I used to get this is shown here:我用来获取此内容的代码如下所示:

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r=requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
for p in soup.find_all('script',attrs={'id':'__NEXT_DATA__'}):
print(p.text)

Specifically I want to access the stat_value, name and substatValue and put these into a pandas data frame.具体来说,我想访问 stat_value、name 和 substatValue 并将它们放入 pandas 数据帧中。 Does anyone know how to do this?有谁知道如何做到这一点?

Use json.loads to parse the data:使用json.loads解析数据:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc, "html.parser")

data = json.loads(soup.find("script", attrs={"id": "__NEXT_DATA__"}).text)

d = data["props"]["pageProps"]["initialState"]["leagueSeasonStats"]["statsData"]
df = pd.DataFrame(d)
df = pd.concat([df, df.pop("nameAndSubstatValue").apply(pd.Series)], axis=1)

print(df)

Prints:印刷:

       id  teamId statValue  rank   type statFormat substatFormat                 name  substatValue
0    8462    8462       8.9     1  teams   fraction        number           Portsmouth            12
1    8451    8451       7.3     2  teams   fraction        number    Charlton Athletic             9
2    9792    9792       7.3     3  teams   fraction        number        Burton Albion             4
3    8671    8671       6.3     4  teams   fraction        number   Accrington Stanley             8
4    9833    9833       6.2     5  teams   fraction        number          Exeter City             9
5   10170   10170       6.1     6  teams   fraction        number         Derby County             3
6    8677    8677       5.9     7  teams   fraction        number  Peterborough United            12
7    8401    8401       5.8     8  teams   fraction        number      Plymouth Argyle             8
8    8559    8559       5.7     9  teams   fraction        number     Bolton Wanderers             5
9    8676    8676       5.3    10  teams   fraction        number    Wycombe Wanderers             8
10  10163   10163       5.3    11  teams   fraction        number  Sheffield Wednesday             7
11   8680    8680       5.3    12  teams   fraction        number      Cheltenham Town             3
12   9902    9902       5.2    13  teams   fraction        number         Ipswich Town            10
13   8283    8283       5.2    14  teams   fraction        number             Barnsley             5
14   8653    8653       5.0    15  teams   fraction        number        Oxford United             3
15   9799    9799       4.3    16  teams   fraction        number            Port Vale             5
16  45723   45723       4.3    17  teams   fraction        number       Fleetwood Town             4
17   9828    9828       4.0    18  teams   fraction        number  Forest Green Rovers             4
18   9896    9896       3.7    19  teams   fraction        number      Shrewsbury Town             2
19   9834    9834       3.5    20  teams   fraction        number     Cambridge United             5
20  10104   10104       3.2    21  teams   fraction        number       Bristol Rovers             7
21   8430    8430       2.9    22  teams   fraction        number         Lincoln City             4
22   8489    8489       2.6    23  teams   fraction        number            Morecambe             2
23   8645    8645       2.2    24  teams   fraction        number   Milton Keynes Dons             3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM