I am trying to get information from https://understat.com/league/EPL .
I´ve tried to read and seen what other people have done, but i just can´t get the last puzzle piece together. i´ve manage to decode but i can´t get it in the jsonObject form. Some one that have an idé
import requests
import json
import pandas as pd
import time
import lxml.html as lh
import codecs
from bs4 import BeautifulSoup
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url = "https://understat.com/league/EPL"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if 'var' in script.text:
encoded_string = script.text
encoded_string = encoded_string .split("JSON.parse('", 1)
encoded_string = encoded_string.rsplit("'),",1)[0]
jsonStr = codecs.getdecoder('unicode-escape')(encoded_string)[0]
jsonObj = json.loads(jsonStr)
print(jsonObj)
raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 4 (char 4)
here is some data jsonString data:
{"id":"9197","isResult":true,"h":{"id":"89","title":"Manchester United","short_title":"MUN"},"a":{"id":"75","title":"Leicester","short_title":"LEI"},"goals":{"h":"2","a":"1"},"xG":{"h":"1.5137","a":"1.73813"},"datetime":"2018-08-10 22:00:00","forecast":{"w":"0.2812","d":"0.3275","l":"0.3913"}},{"id":"9198","isResult":true,"h":{"id":"86","title":"Newcastle United","short_title":"NEW"},"a":{"id":"82","title":"Tottenham","short_title":"TOT"},"goals":{"h":"1","a":"2"},"xG":{"h":"0.974497","a":"2.58097"},"datetime":"2018-08-11 14:30:00","forecast":{"w":"0.08","d":"0.1479","l":"0.7721"}},{"id":"9199","isResult":true,"h":{"id":"90","title":"Watford","short_title":"WAT"},"a":{"id":"220","title":"Brighton","short_title":"BRI"},"goals":{"h":"2","a":"0"},"xG":{"h":"1.42372","a":"0.45504"},"datetime":"2018-08-11 17:00:00","forecast":{"w":"0.6438","d":"0.2574","l":"0.0988"}},
Try with the following different regex and substring
import requests
import re
import json
import codecs
r = requests.get('https://understat.com/league/EPL')
p = re.compile(r'JSON.parse\((.*)\);')
d = p.findall(r.text)[0]
json_str = codecs.getdecoder('unicode-escape')(d)[0]
data = json.loads(json_str[1:-1])
Sample of print(data) output
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.