BeautifulSoup 數據抓取和數據庫

Question

我正在使用 BeautifulSoup 來解析網站。

現在我的問題如下：我想將所有這些都寫入一個數據庫（如 sqlite），其中包含制定目標的分鍾數（我可以從我獲得的鏈接中獲得此信息），但這只能在如果進球數不是? - ? ? - ? ，因為沒有制定任何目標。

from pprint import pprint
import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.livescore.com/soccer/champions-league/'))

data = []
for match in soup.select('table.league-table tr'):
    try:
        team1, team2 = match.find_all('td', class_=['fh', 'fa'])
    except ValueError:  # helps to skip irrelevant rows
        continue

    score = match.find('a', class_='scorelink').text.strip()
    data.append({
        'team1': team1.text.strip(),
        'team2': team2.text.strip(),
        'score': score
    })

pprint(data)

href_tags = soup.find_all('a', {'class':"scorelink"})

links = []

for x in xrange(1, len(href_tags)):
    insert = href_tags[x].get("href");links.append(insert)

print links

Answer 1

首先，如果不是來自參加比賽的球隊，得分有什么意義？

這個想法是迭代每個表中的每一行都有排名league-table類。 對於每一行，獲取團隊名稱和分數。 將結果收集到字典列表中：

from pprint import pprint
import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.livescore.com/soccer/champions-league/'))

data = []
for match in soup.select('table.league-table tr'):
    try:
        team1, team2 = match.find_all('td', class_=['fh', 'fa'])
    except ValueError:  # helps to skip irrelevant rows
        continue

    score = match.find('a', class_='scorelink').text.strip()
    data.append({
        'team1': team1.text.strip(),
        'team2': team2.text.strip(),
        'score': score
    })

pprint(data)

印刷：

[
    {'score': u'? - ?', 'team1': u'Atletico Madrid', 'team2': u'Malmo FF'},
    {'score': u'? - ?', 'team1': u'Olympiakos', 'team2': u'Juventus'},
    {'score': u'? - ?', 'team1': u'Liverpool', 'team2': u'Real Madrid'},
    {'score': u'? - ?', 'team1': u'PFC Ludogorets Razgrad', 'team2': u'Basel'},
    ...
]

請注意，當前它會附加每場比賽，即使尚未進行比賽。 如果您需要收集有分數的比賽，您可以簡單地檢查score是否不等於? - ? ? - ? ：

if score != '? - ?':
    data.append({
        'team1': team1.text.strip(),
        'team2': team2.text.strip(),
        'score': score
    })

在這種情況下，輸出將是：

[{'score': u'2 - 2', 'team1': u'CSKA Moscow', 'team2': u'Manchester City'},
 {'score': u'3 - 0', 'team1': u'Zenit St. Petersburg', 'team2': u'Standard Liege'},
 {'score': u'4 - 0', 'team1': u'APOEL Nicosia', 'team2': u'AaB'},
 {'score': u'3 - 0', 'team1': u'BATE Borisov', 'team2': u'Slovan Bratislava'},
 {'score': u'0 - 1', 'team1': u'Celtic', 'team2': u'Maribor'},
 {'score': u'2 - 0', 'team1': u'FC Porto', 'team2': u'Lille'},
 {'score': u'1 - 0', 'team1': u'Arsenal', 'team2': u'Besiktas'},
 {'score': u'3 - 1', 'team1': u'Athletic Bilbao', 'team2': u'SSC Napoli'},
 {'score': u'4 - 0', 'team1': u'Bayer Leverkusen', 'team2': u'FC Koebenhavn'},
 {'score': u'3 - 0', 'team1': u'Malmo FF', 'team2': u'Salzburg'},
 {'score': u'1 - 0', 'team1': u'PFC Ludogorets Razgrad *', 'team2': u'Steaua Bucuresti'}]

至於“寫入數據庫”部分，您可以使用帶有named parameters sqlite3模塊和executemany() ：

import sqlite3

conn = sqlite3.connect('data.db')
conn.execute("""
    CREATE TABLE IF NOT EXISTS matches (
        id    integer primary key autoincrement not null,
        team1  text,
        team2 text,
        score text
    )""")

cursor = conn.cursor()
cursor.executemany("""
    INSERT INTO 
        matches (team1, team2, score) 
    VALUES 
        (:team1, :team2, :score)""", data)
conn.commit()
conn.close()

當然還有其他事情需要改進或討論，但我認為這對你來說是一個好的開始。

BeautifulSoup 數據抓取和數據庫

問題描述

1 個解決方案

解決方案1
1 已采納 2014-10-21 18:39:44

BeautifulSoup 數據抓取和數據庫

問題描述

1 個解決方案

解決方案1 1 已采納 2014-10-21 18:39:44

解決方案1
1 已采納 2014-10-21 18:39:44