繁体   English   中英

使用Python中的BeautifulSoup在Web上将日常表格抓取为CSV

[英]Web scraping daily tables into CSV with BeautifulSoup in Python

我是python的新手,正在寻求BeautifulSoup的帮助。 我正在尝试从http://contests.covers.com/Handicapping/consensusPick/daily-consensus-picks.aspx?sport=5&date=7/4/2014抓取一些棒球数据并将其存储到CSV文件中。 我想遍历URL中的每个日历日期,以获取游戏每天的数据。

我敢肯定可能有一些错误,但是到目前为止,我看起来像这样:

import csv
import urllib2
from bs4 import BeautifulSoup

with open('covers.csv', 'wb') as f:
    writer = csv.writer(f)
    for i in range(31):
        #I'd like to loop through actual dates instead of my 'i' here
        url = "http://contests.covers.com/Handicapping/consensusPick/daily-consensus-picks.aspx?sport=5&date=5/{}/2014".format(i)
        u = urllib2.urlopen(url)
        try:
            html = u.read()
        finally:
            u.close()
        soup=BeautifulSoup(html)
        for mytable in soup.find_all(class="thepicks")
            for trs in mytable.find_all('tr')
                tds = trs.find_all('td')
                row = [elem.text.encode('utf-8') for elem in tds]
                writer.writerow(row)
import requests
from bs4 import BeautifulSoup

url = "http://contests.covers.com/Handicapping/consensusPick/daily-consensus-picks.aspx?sport=5&date=7/4/2014"
response = requests.get(url)
soup = BeautifulSoup(response.text)

for mytable in soup.find_all('table', 'thepicks'):
    for trs in mytable.find_all('tr'):
        tds = trs.find_all('td')
        row = [elem.text.strip().encode('utf-8') for elem in tds]
        print row

结果

['Time', 'Away', 'Line', 'Picks', 'Pct', 'Home', 'Line', 'Picks', 'Pct', 'Detail', 'Odds']
['7:15 PM', 'Miami', '+133', '388', '29.02%', 'St. Louis', '-144', '949', '70.98%', 'View', 'View']
['7:08 PM', 'Tampa Bay', '+106', '444', '31.76%', 'Detroit', '-115', '954', '68.24%', 'View', 'View']
['7:35 PM', 'Arizona', '+145', '439', '33.13%', 'Atlanta', '-157', '886', '66.87%', 'View', 'View']
['5:05 PM', 'Philadelphia', '+180', '432', '34.70%', 'Pittsburgh', '-196', '813', '65.30%', 'View', 'View']
['9:05 PM', 'Houston', '+165', '507', '37.56%', 'LA Angels', '-179', '843', '62.44%', 'View', 'View']
['11:05 AM', 'Chi. Cubs', '+141', '388', '40.42%', 'Washington', '-153', '572', '59.58%', 'View', 'View']
['4:05 PM', 'Toronto', '+114', '541', '40.89%', 'Oakland', '-123', '782', '59.11%', 'View', 'View']
['7:10 PM', 'Seattle', '+161', '599', '45.14%', 'Chi. White Sox', '-175', '728', '54.86%', 'View', 'View']
['7:10 PM', 'Milwaukee', '+102', '614', '46.80%', 'Cincinnati', '-110', '698', '53.20%', 'View', 'View']
['3:10 PM', 'NY Yankees', '+100', '630', '50.28%', 'Minnesota', '-108', '623', '49.72%', 'View', 'View']
['7:05 PM', 'Kansas City', '+103', '706', '55.50%', 'Cleveland', '-111', '566', '44.50%', 'View', 'View']
['6:40 PM', 'San Francisco', '-108', '827', '60.63%', 'San Diego', '+100', '537', '39.37%', 'View', 'View']
['7:10 PM', 'Texas', '-153', '916', '67.60%', 'NY Mets', '+141', '439', '32.40%', 'View', 'View']
['8:10 PM', 'LA Dodgers', '-215', '946', '69.41%', 'Colorado', '+197', '417', '30.59%', 'View', 'View']
['Time', 'Away', 'Total', 'Home', 'Over', 'Pct', 'Under', 'Pct', 'Detail', 'Odds']
['7:10 PM', 'Seattle', '7.5', 'Chi. White Sox', '299', '38.93%', '469', '61.07%', 'View', 'View']
['8:10 PM', 'LA Dodgers', '9.5', 'Colorado', '230', '43.15%', '303', '56.85%', 'View', 'View']
['7:10 PM', 'Milwaukee', '7.5', 'Cincinnati', '373', '47.40%', '414', '52.60%', 'View', 'View']
['7:15 PM', 'Miami', '7.5', 'St. Louis', '360', '48.19%', '387', '51.81%', 'View', 'View']
['11:05 AM', 'Chi. Cubs', '7', 'Washington', '257', '48.22%', '276', '51.78%', 'View', 'View']
['7:35 PM', 'Arizona', '7.0', 'Atlanta', '379', '50.40%', '373', '49.60%', 'View', 'View']
['4:05 PM', 'Toronto', '8', 'Oakland', '392', '52.34%', '357', '47.66%', 'View', 'View']
['7:08 PM', 'Tampa Bay', '8', 'Detroit', '421', '54.89%', '346', '45.11%', 'View', 'View']
['3:10 PM', 'NY Yankees', '8', 'Minnesota', '402', '55.76%', '319', '44.24%', 'View', 'View']
['5:05 PM', 'Philadelphia', '7.5', 'Pittsburgh', '426', '57.26%', '318', '42.74%', 'View', 'View']
['7:10 PM', 'Texas', '6.5', 'NY Mets', '278', '57.68%', '204', '42.32%', 'View', 'View']
['6:40 PM', 'San Francisco', '7', 'San Diego', '482', '58.14%', '347', '41.86%', 'View', 'View']
['9:05 PM', 'Houston', '8', 'LA Angels', '478', '58.51%', '339', '41.49%', 'View', 'View']
['7:05 PM', 'Kansas City', '7.5', 'Cleveland', '461', '60.58%', '300', '39.42%', 'View', 'View']

如果需要添加列:

row = [elem.text.strip().encode('utf-8') for elem in tds]
row.append("7/4/2014")

如果需要修改现有列:
(例如,删除带有文本View列)

row = []

for elem in tds:
    text = elem.text.strip().encode('utf-8')
    if text != 'View':
        row.append( text )

row.append("7/4/2014")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM