使用漂亮的汤将剪贴的数据循环到网站的不同页面

Question

Below is a web scraper the successfully pulls roster information from a team's website and exports it into a CSV file. 以下是一个网络抓取工具，它可以成功地从团队的网站中提取花名册信息并将其导出到CSV文件中。 As you can see, each team website has a similar url pattern. 如您所见，每个团队网站都有类似的url模式。

http://m.redsox.mlb.com/roster/
http://m.yankees.mlb.com/roster/

I am trying to create a loop that will loop through each team's website, scrape each player's roster information, and write it to a CSV file. 我正在尝试创建一个循环，该循环将遍历每个团队的网站，抓取每个球员的花名册信息，并将其写入CSV文件。 At the beginning of my code, I created a dictionary of team names and formatted it to the url to request a page. 在代码的开头，我创建了一个团队名称字典，并将其格式化为url以请求页面。 This strategy worked, however, the code is only looping through the last page I list in my dictionary. 这种策略有效，但是，代码仅在字典中列出的最后一页中循环。 Does anyone know how to alter this code so that it loops through all the pages in the team_list dictionary? 有谁知道如何更改此代码，以使其遍历team_list词典中的所有页面？ Thanks in advance! 提前致谢！

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

for team in team_list:
    page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
    soup = BeautifulSoup(page.text, 'html.parser')

    soup.find(class_='nav-tabset-container').decompose()
    soup.find(class_='column secondary span-5 right').decompose()

    roster = soup.find(class_='layout layout-roster')
    names = [n.contents[0] for n in roster.find_all('a')]
    ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
    number = [n.contents[0] for n in roster.find_all('td', index='0')]
    handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
    height = [n.contents[0] for n in roster.find_all('td', index='4')]
    weight = [n.contents[0] for n in roster.find_all('td', index='5')]
    DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
    team = [soup.find('meta',property='og:site_name')['content']] * len(names)

    with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
        f = csv.writer(fp)
        f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

Answer 1

I believe that by replacing your dictionary with a list you should solve the issue: 我相信通过将您的词典替换为列表，您应该可以解决此问题：

import requests
import csv
import pandas as pd

from bs4 import BeautifulSoup

team_list=['yankees','redsox']
output = []

for team in team_list:
    page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
    soup = BeautifulSoup(page.text, 'html.parser')

    soup.find(class_='nav-tabset-container').decompose()
    soup.find(class_='column secondary span-5 right').decompose()

    roster = soup.find(class_='layout layout-roster')
    names = [n.contents[0] for n in roster.find_all('a')]
    ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
    number = [n.contents[0] for n in roster.find_all('td', index='0')]
    handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
    height = [n.contents[0] for n in roster.find_all('td', index='4')]
    weight = [n.contents[0] for n in roster.find_all('td', index='5')]
    DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
    team = [soup.find('meta',property='og:site_name')['content']] * len(names)

    output.append([names, ids, number, handedness, height, weight, DOB, team])

pd.DataFrame(data=output, columns=['Name','ID','Number','Hand','Height','Weight','DOB','Team']).tocsv('csvfilename.csv')

使用漂亮的汤将剪贴的数据循环到网站的不同页面

问题描述

1 个解决方案

解决方案1
1 2018-07-15 00:49:37

使用漂亮的汤将剪贴的数据循环到网站的不同页面

问题描述

1 个解决方案

解决方案1 1 2018-07-15 00:49:37

解决方案1
1 2018-07-15 00:49:37