Beautiful Soup 可變跨度類

Question

想知道您是否可以幫助我進行一些網絡抓取。

下面是我想從中獲取數據的跨度類。 問題是對於不同的數據點，span 類中有一個隨機數。

我知道“price-val”部分對於所有迭代都是相同的，但是在獲取數據時我無法弄清楚如何僅搜索此部分。

   <span class="price-val_196775436 odd-val ib right">
    2.47
    </span>

我的代碼到目前為止

    url ="http://www.sportsbet.com.au/betting/american-football"
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    g_data = soup.find_all("div", {"class": "accordion-body"})

    

    for item in g_data:
            A = item.find('span', {'class': 'team-name ib'}).text
            B = item.find('span', {'class': 'price-val_196775436 odd-val ib right'}).text

我得到的錯誤

Traceback (most recent call last):
  File "C:\Users\James\Desktop\NFLsportsbet.py", line 23, in <module>
    B = item.find('span', {'class': 'price-val'}).text
AttributeError: 'NoneType' object has no attribute 'text'

Answer 1

使用解析器庫，例如lxml並且可能需要使用正則表達式或 lambda-

import requests,re
from  bs4  import  BeautifulSoup

url ="http://www.sportsbet.com.au/betting/american-football"
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
g_data = soup.find_all("div", {"class": "accordion-body"})



for items in g_data:
    print items.find('span', {'class': 'team-name ib'}).text
    print items.find('span', {'class': lambda L: L and L.startswith('price-val_')}).text
    #print items.find('span', {'class': re.compile('price-val_*')}).text  #or regex like this

它打印

Detroit Lions

2.47

Tampa Bay Buccaneers

3.85

Arizona Cardinals

1.39

San Diego Chargers

2.65

San Francisco 49ers

3.95

New York Giants

2.40

Cincinnati Bengals

1.97

Tennessee Titans

2.61

Minnesota Vikings

1.90

New York Jets

1.66

Seattle Seahawks

1.46

Green Bay Packers

1.68

Indianapolis Colts

3.22

Beautiful Soup 可變跨度類

問題描述

1 個解決方案

解決方案1
2 已采納 2015-11-01 10:54:29

Beautiful Soup 可變跨度類

問題描述

1 個解決方案

解決方案1 2 已采納 2015-11-01 10:54:29

解決方案1
2 已采納 2015-11-01 10:54:29