简体   繁体   中英

Python requests.get not returning all elements from website

I am trying to get the fixtures of the players from this website but when I use requests.get, it returns none.

r = requests.get("http://www.fplstatistics.co.uk/")
soup = BeautifulSoup(compiled.sub("",r.text),'lxml')
allFixtures = soup.find("span", {"class": "dtr-data"})
return allFixtures

The information you need is not contained in the HTML returned from your URL. The browser constructs another call to get this via javascript (which requests does not support).

By observing using your browser's developer tools you can see the request being made to get the data returned as JSON.

The URL it uses to get this unfortunately needs some information which is buried inside one of the script sections inside the HTML. The key and value needed both are using HEX format (if you search the HTML, you will find it).

A regular expression can be used to extract the key and value needed to make the call. With this, a second requests call can be made to get the JSON (the same way a browser would). I suggest you print this out so you can see the structure of all the information that is returned.

The following should work:

import requests
import re

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"}
s = requests.Session()
req_main = s.get("http://www.fplstatistics.co.uk/", headers=headers)

k = re.search(r'"\\x6E\\x61\\x6D\\x65":"(.*?)"', req_main.text).group(1)
v = re.search(r'"\\x76\\x61\\x6C\\x75\\x65":(.*?)}', req_main.text).group(1)

url_json = f"http://www.fplstatistics.co.uk/Home/AjaxPricesIHandler?{k}={v}&pyseltype=0"
req_json = s.get(url_json, headers=headers)
fixtures = [fixture[-1] for fixture in req_json.json()["aaData"]]

for fixture in fixtures:
    print(fixture)

Giving you output starting:

Aston Villa(H) Leicester(A) Watford(H) Liverpool(A) 
Aston Villa(H) Leicester(A) Watford(H) Liverpool(A) 
Aston Villa(H) Leicester(A) Watford(H) Liverpool(A)

此页面使用 JavaScript 动态加载要从此类页面中提取数据,您可以点击此链接: Python Scrape 网站加载 JS

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM