Python 美麗湯：針對特定元素

Question

我正在嘗試抓取網站的特定部分 ( https://flightmath.com/from-CDG-to-BLR )，但我無法定位我需要的元素。

下面是html的一部分

<h2 style="background-color:#7DC2F8;padding:10px"><i class="fa fa-plane"></i>  flight distance = <strong>4,866</strong> miles</h2>

這是我的代碼

dist = soup.find('h2', attrs={'class': 'fa fa-plane'})

我只想針對“4,866”部分。

如果有人可以指導我，我將非常感激。 提前致謝。

Answer 1

attrs={'class': '...'}需要准確的class屬性值（不是組合）。
相反，使用soup.select_one方法通過擴展的css規則進行選擇：

from bs4 import BeautifulSoup
import requests

url = 'https://flightmath.com/from-CDG-to-BLR'
html_data = requests.get(url).content
soup = BeautifulSoup(html_data, 'html.parser')

dist = soup.select_one('h2 i.fa-plane + strong')
print(dist.text)   # 4,866

Answer 2

如果感興趣：該值被硬編碼到 html 中（用於計算飛行速度），因此您還可以使用以下正則表達式輸出更精確的值。 您可以使用round()獲取頁面上顯示的值。

import requests, re

urls = ['https://flightmath.com/from-CDG-to-BOM', 'https://flightmath.com/from-CDG-to-BLR', 'https://flightmath.com/from-CDG-to-IXC']
p = re.compile(r'flightspeed\.min\.value\/60 \+ ([0-9.]+)')
with requests.Session() as s:
    for url in urls:
        print(p.findall(s.get(url).text)[0])

Answer 3

使用類名查找標簽，然后使用 find_next() 查找強標簽。

from bs4 import BeautifulSoup
import requests

url = 'https://flightmath.com/from-CDG-to-BLR'
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, 'html.parser')
dist = soup.find('i',class_='fa-plane').find_next('strong')
print(dist.text)

Python 美麗湯：針對特定元素

問題描述

下面是html的一部分

這是我的代碼

3 個解決方案

解決方案1
1 已采納 2019-08-09 14:37:37

解決方案2
0 2019-08-09 15:19:15

解決方案3
0 2019-08-09 15:27:07

Python 美麗湯：針對特定元素

問題描述

下面是html的一部分

這是我的代碼

3 個解決方案

解決方案1 1 已采納 2019-08-09 14:37:37

解決方案2 0 2019-08-09 15:19:15

解決方案3 0 2019-08-09 15:27:07

解決方案1
1 已采納 2019-08-09 14:37:37

解決方案2
0 2019-08-09 15:19:15

解決方案3
0 2019-08-09 15:27:07