簡體   English   中英

Web 用 bs4 刮 python:如何顯示足球比賽

[英]Web scraping with bs4 python: How to display football matchups

我是 Python 的初學者,我正在嘗試創建一個程序,該程序將從 skysports.com 中抓取足球/足球賽程表,並通過短信通過 Z9017AFDB6E0254E5B2E8D2560A 將其發送到我的手機。 我已經排除了 SMS 代碼,因為我已經弄清楚了,所以這是迄今為止我遇到的 web 抓取代碼:

import requests
from bs4 import BeautifulSoup

URL = "https://www.skysports.com/football-fixtures"
page = requests.get(URL)

results = BeautifulSoup(page.content, "html.parser")

d = defaultdict(list)

comp = results.find('h5', {"class": "fixres__header3"})
team1 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side1"})
date = results.find('span', {"class": "matches__date"})
team2 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side2"})

for ind in range(len(d)):
    d['comp'].append(comp[ind].text)
    d['team1'].append(team1[ind].text)
    d['date'].append(date[ind].text)
    d['team2'].append(team2[ind].text) 

下面應該為您解決問題:

   from bs4 import BeautifulSoup
   import requests
    
    a = requests.get('https://www.skysports.com/football-fixtures')
    soup = BeautifulSoup(a.text,features="html.parser")
    
    teams = []
    for date in soup.find_all(class_="fixres__header2"): # searching in that date
        for i in soup.find_all(class_="swap-text--bp30")[1:]: #skips the first one because that's a heading
            teams.append(i.text)
    
    date = soup.find(class_="fixres__header2").text
    print(date)
    teams = [i.strip('\n') for i in teams]
    for x in range(0,len(teams),2):
        print (teams[x]+" vs "+ teams[x+1])

讓我進一步解釋我所做的:所有的足球都有這個 class 名稱 -交換文本 - bp30 在此處輸入圖像描述

所以我們可以使用find_all來提取所有具有該名稱的類。

一旦我們得到結果,我們可以將它們放入數組“teams = []”,然后將 append 放入 for 循環“team.append(i.text)”中。 “.text”剝離 html

然后我們可以去掉數組中的“\n”,方法是剝離它並兩兩打印出數組中的每個字符串。 這應該是您最終的 output:

在此處輸入圖像描述

編輯:為了獲得聯賽的頭銜,我們將做幾乎相同的事情:

league = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
    for i in soup.find_all(class_="fixres__header3"): #skips the first one because that's a heading
        league.append(i.text)

剝離數組並創建另一個:

league = [i.strip('\n') for i in league]
final = []

然后添加最后一段代碼,它基本上只是一遍又一遍地打印聯盟然后兩支球隊:

for x in range(0,len(teams),5):
    final.append(teams[x]+" vs "+ teams[x+1])

for i in league:
    print(i)
    for i in final:
        print(i)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM