Python 美丽汤：针对特定元素

Question

I am trying to scrape a particular part of a website( https://flightmath.com/from-CDG-to-BLR ) but I am unable to target the element that I need.我正在尝试抓取网站的特定部分 ( https://flightmath.com/from-CDG-to-BLR )，但我无法定位我需要的元素。

Below is the part of the html下面是html的一部分

<h2 style="background-color:#7DC2F8;padding:10px"><i class="fa fa-plane"></i>  flight distance = <strong>4,866</strong> miles</h2>

This is my code这是我的代码

dist = soup.find('h2', attrs={'class': 'fa fa-plane'})

I just want to target the "4,866" part.我只想针对“4,866”部分。

I would be really grateful if someone can guide me on this.如果有人可以指导我，我将非常感激。 Thanks in advance.提前致谢。

Answer 1

attrs={'class': '...'} requires an exact class attribute value (not a combination). attrs={'class': '...'}需要准确的class属性值（不是组合）。
Instead, use soup.select_one method to select by extended css rule:相反，使用soup.select_one方法通过扩展的css规则进行选择：

from bs4 import BeautifulSoup
import requests

url = 'https://flightmath.com/from-CDG-to-BLR'
html_data = requests.get(url).content
soup = BeautifulSoup(html_data, 'html.parser')

dist = soup.select_one('h2 i.fa-plane + strong')
print(dist.text)   # 4,866

Answer 2

In case of interest: The value is hard coded into the html (for a flight speed calculation) so you could also regex out a more precise value with the following.如果感兴趣：该值被硬编码到 html 中（用于计算飞行速度），因此您还可以使用以下正则表达式输出更精确的值。 You can use round() to get the value shown on page.您可以使用round()获取页面上显示的值。

import requests, re

urls = ['https://flightmath.com/from-CDG-to-BOM', 'https://flightmath.com/from-CDG-to-BLR', 'https://flightmath.com/from-CDG-to-IXC']
p = re.compile(r'flightspeed\.min\.value\/60 \+ ([0-9.]+)')
with requests.Session() as s:
    for url in urls:
        print(p.findall(s.get(url).text)[0])

Answer 3

find tag with class name and then use find_next() to find the strong tag.使用类名查找标签，然后使用 find_next() 查找强标签。

from bs4 import BeautifulSoup
import requests

url = 'https://flightmath.com/from-CDG-to-BLR'
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, 'html.parser')
dist = soup.find('i',class_='fa-plane').find_next('strong')
print(dist.text)

Python 美丽汤：针对特定元素

问题描述

Below is the part of the html下面是html的一部分

This is my code这是我的代码

3 个解决方案

解决方案1
1 已采纳 2019-08-09 14:37:37

解决方案2
0 2019-08-09 15:19:15

解决方案3
0 2019-08-09 15:27:07

Python 美丽汤：针对特定元素

问题描述

Below is the part of the html下面是html的一部分

This is my code这是我的代码

3 个解决方案

解决方案1 1 已采纳 2019-08-09 14:37:37

解决方案2 0 2019-08-09 15:19:15

解决方案3 0 2019-08-09 15:27:07

解决方案1
1 已采纳 2019-08-09 14:37:37

解决方案2
0 2019-08-09 15:19:15

解决方案3
0 2019-08-09 15:27:07