简体   繁体   中英

Beautifulsoup: activate web button and continue scraping on new page

I'm having a university project and need to get data online. I would like to get some data from this website. https://www.footballdatabase.eu/en/transfers/-/2020-10-03

For the 3rd of October I managed to get the first 19 rows but then there are 6 pages and I'm struggling to activate the button for loading the next page.

This is the html code for the button:

<a href="javascript:;" class="inactive" onclick="showtransfers('1','2020-10-03','2','full');">2</a>

My code so far:

import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

page = "https://www.footballdatabase.eu/en/transfers/-/2020-10-03"
pageTree = requests.get(page, headers=headers)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')

Players = pageSoup.find_all("span", {"class": "name"})
Team = pageSoup.find_all("span", {"class": "firstteam"})
Values = pageSoup.find_all("span", {"class": "transferamount"})
Values[0].text

PlayersList = []
TeamList = []
ValuesList = []
j=1

for i in range(0,20):
    PlayersList.append(Players[i].text)
    TeamList.append(Team[i].text)
    ValuesList.append(Values[i].text)
    j=j+1
df = pd.DataFrame({"Players":PlayersList,"Team":TeamList,"Values":ValuesList})

Thank you very much!

You can use requests module to simulate the Ajax call. For example:

import requests
from bs4 import BeautifulSoup


data = {
    'date':  '2020-10-03',
    'pid': 1,
    'page': 1,
    'filter': 'full',
}

url = 'https://www.footballdatabase.eu/ajax_transfers_show.php'

for data['page'] in range(1, 7):  # <--- adjust number of pages here.
    soup = BeautifulSoup(requests.post(url, data=data).content, 'html.parser')

    for line in soup.select('.line'):
        name = line.a.text
        first_team = line.select_one('.firstteam').a.text if line.select_one('.firstteam').a else 'Free'
        second_team = line.select_one('.secondteam').a.text if line.select_one('.secondteam').a else 'Free'
        amount = line.select_one('.transferamount').text

        print('{:<30} {:<20} {:<20} {}'.format(name, first_team, second_team, amount))

Prints:

Bruno Amione                   Belgrano             Hellas Vérone        1.7 M€
Ismael Gutierrez               Betis Deportivo      Atlético B           1 M€
Vitaly Janelt                  Bochum               Brentford            500 k€
Sven Ulreich                   Bayern Munich        Hambourg SV          500 k€
Salim Ali Al Hammadi           Baniyas              Khor Fakkan          Prêt
Giovanni Alessandretti         Ascoli U-20          Recanatese           Prêt
Gabriele Bellodi               AC Milan U-20        Alessandria          Prêt
Louis Britton                  Bristol City B       Torquay United       Prêt
Juan Brunetta                  Godoy Cruz           Parme                Prêt
Bobby Burns                    Barrow               Glentoran            Prêt
Bohdan Butko                   Shakhtar Donetsk     Lech Poznan          Prêt
Nicolò Casale                  Hellas Vérone        Empoli               Prêt
Alessio Da Cruz                Parme                FC Groningue         Prêt
Dalbert Henrique               Inter Milan          Rennes               Prêt

...and so on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM