简体   繁体   中英

How to extract information from a scraped JSON string (Python)?

I am trying to scrape some data from FotMob (a football website), but when accessing the HTML with requests and beautiful soup it returns a huge string of text which looks like it is in the form of a json. An extract is shown below:

{"id":9902,"teamId":9902,"nameAndSubstatValue":{"name":"Ipswich Town","substatValue":10},"statValue":"5.2","rank":13,"type":"teams","statFormat":"fraction","substatFormat":"number"},{"id":8283,"teamId":8283,"nameAndSubstatValue":{"name":"Barnsley","substatValue":5},"statValue":"5.2","rank":14,"type":"teams","statFormat":"fraction","substatFormat":"number"}

The code I used to get this is shown here:

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r=requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
for p in soup.find_all('script',attrs={'id':'__NEXT_DATA__'}):
print(p.text)

Specifically I want to access the stat_value, name and substatValue and put these into a pandas data frame. Does anyone know how to do this?

Use json.loads to parse the data:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc, "html.parser")

data = json.loads(soup.find("script", attrs={"id": "__NEXT_DATA__"}).text)

d = data["props"]["pageProps"]["initialState"]["leagueSeasonStats"]["statsData"]
df = pd.DataFrame(d)
df = pd.concat([df, df.pop("nameAndSubstatValue").apply(pd.Series)], axis=1)

print(df)

Prints:

       id  teamId statValue  rank   type statFormat substatFormat                 name  substatValue
0    8462    8462       8.9     1  teams   fraction        number           Portsmouth            12
1    8451    8451       7.3     2  teams   fraction        number    Charlton Athletic             9
2    9792    9792       7.3     3  teams   fraction        number        Burton Albion             4
3    8671    8671       6.3     4  teams   fraction        number   Accrington Stanley             8
4    9833    9833       6.2     5  teams   fraction        number          Exeter City             9
5   10170   10170       6.1     6  teams   fraction        number         Derby County             3
6    8677    8677       5.9     7  teams   fraction        number  Peterborough United            12
7    8401    8401       5.8     8  teams   fraction        number      Plymouth Argyle             8
8    8559    8559       5.7     9  teams   fraction        number     Bolton Wanderers             5
9    8676    8676       5.3    10  teams   fraction        number    Wycombe Wanderers             8
10  10163   10163       5.3    11  teams   fraction        number  Sheffield Wednesday             7
11   8680    8680       5.3    12  teams   fraction        number      Cheltenham Town             3
12   9902    9902       5.2    13  teams   fraction        number         Ipswich Town            10
13   8283    8283       5.2    14  teams   fraction        number             Barnsley             5
14   8653    8653       5.0    15  teams   fraction        number        Oxford United             3
15   9799    9799       4.3    16  teams   fraction        number            Port Vale             5
16  45723   45723       4.3    17  teams   fraction        number       Fleetwood Town             4
17   9828    9828       4.0    18  teams   fraction        number  Forest Green Rovers             4
18   9896    9896       3.7    19  teams   fraction        number      Shrewsbury Town             2
19   9834    9834       3.5    20  teams   fraction        number     Cambridge United             5
20  10104   10104       3.2    21  teams   fraction        number       Bristol Rovers             7
21   8430    8430       2.9    22  teams   fraction        number         Lincoln City             4
22   8489    8489       2.6    23  teams   fraction        number            Morecambe             2
23   8645    8645       2.2    24  teams   fraction        number   Milton Keynes Dons             3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM