繁体   English   中英

python beautifulsoup 下一页

[英]python beautifulsoup next page

这是我当前用于从站点中抓取特定玩家数据的代码:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
import lxml
import xlsxwriter

page = requests.get('https://www.futbin.com/players?page=1')
soup = BeautifulSoup(page.content, 'lxml')
pool = soup.find(id='repTb')

pnames = pool.find_all(class_='player_name_players_table')
pprice = pool.find_all(class_='ps4_color font-weight-bold')
prating = pool.select('span[class*="form rating ut20"]')


all_player_names = [name.getText() for name in pnames]
all_prices = [price.getText() for price in pprice]
all_pratings = [rating.getText() for rating in prating]

fut_data = pd.DataFrame(
    {
        'Player': all_player_names,
        'Rating': all_pratings,
        'Price': all_prices,
     })

writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')

fut_data.to_excel(writer,'Futbin')
writer.save()

print(fut_data)

这对于第一页工作正常。 但是我总共需要浏览 609 页并从所有页面中获取数据。

你能帮我重新编写这段代码以使其正常工作吗? 我还是个新手,正在学习这个项目。

您可以遍历所有609页面,解析每个页面,最后将收集的数据保存到file.xlsx

import requests
from bs4 import BeautifulSoup
import pandas as pd

all_player_names = []
all_pratings = []
all_prices = []

for i in range(1, 610):
    page = requests.get('https://www.futbin.com/players?page={}'.format(i))
    soup = BeautifulSoup(page.content, 'lxml')
    pool = soup.find(id='repTb')

    pnames = pool.find_all(class_='player_name_players_table')
    pprice = pool.find_all(class_='ps4_color font-weight-bold')
    prating = pool.select('span[class*="form rating ut20"]')

    all_player_names.extend([name.getText() for name in pnames])
    all_prices.extend([price.getText() for price in pprice])
    all_pratings.extend([rating.getText() for rating in prating])

fut_data = pd.DataFrame({'Player': all_player_names,
                         'Rating': all_pratings,
                         'Price': all_prices})

writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
fut_data.to_excel(writer, 'Futbin')
writer.save()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM