I am trying to scrape a particular part of a website( https://flightmath.com/from-CDG-to-BLR ) but I am unable to target the element that I need.
<h2 style="background-color:#7DC2F8;padding:10px"><i class="fa fa-plane"></i> flight distance = <strong>4,866</strong> miles</h2>
dist = soup.find('h2', attrs={'class': 'fa fa-plane'})
I just want to target the "4,866" part.
I would be really grateful if someone can guide me on this. Thanks in advance.
attrs={'class': '...'}
requires an exact class
attribute value (not a combination).
Instead, use soup.select_one
method to select by extended css
rule:
from bs4 import BeautifulSoup
import requests
url = 'https://flightmath.com/from-CDG-to-BLR'
html_data = requests.get(url).content
soup = BeautifulSoup(html_data, 'html.parser')
dist = soup.select_one('h2 i.fa-plane + strong')
print(dist.text) # 4,866
In case of interest: The value is hard coded into the html (for a flight speed calculation) so you could also regex out a more precise value with the following. You can use round()
to get the value shown on page.
import requests, re
urls = ['https://flightmath.com/from-CDG-to-BOM', 'https://flightmath.com/from-CDG-to-BLR', 'https://flightmath.com/from-CDG-to-IXC']
p = re.compile(r'flightspeed\.min\.value\/60 \+ ([0-9.]+)')
with requests.Session() as s:
for url in urls:
print(p.findall(s.get(url).text)[0])
find tag with class name and then use find_next() to find the strong tag.
from bs4 import BeautifulSoup
import requests
url = 'https://flightmath.com/from-CDG-to-BLR'
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, 'html.parser')
dist = soup.find('i',class_='fa-plane').find_next('strong')
print(dist.text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.