HTML:
<div class="col-7">
<dl class="row box">
<h2>GENERAL</h2>
<dt class="col-6">transmission:</dt>
<dd class="col-6">sequential automatic</dd>
<dt class="col-6 grey">number of seats:</dt>
<dd class="col-6">5</dd>
<dt class="col-6">first year of production:</dt>
<dd class="col-6">2017</dd>
<dt class="col-6 grey">last year of production:</dt>
<dd class="col-6">available</dd>
</dl>
<dl class="row box">
<h2>DRIVE</h2>
<dt class="col-6">fuel:</dt>
<dd class="col-6">petrol</dd>
<dt class="col-6 grey">total maximum power:</dt>
<dd class="col-6">147 kW (200 hp)</dd>
<dt class="col-6">total maximum torque:</dt>
<dd class="col-6">330 Nm</dd>
</dl>
<dl class="row box">
<h2>TRANSMISSION</h2>
<dt class="col-6">1st gear:</dt>
<dd class="col-6">5,00:1</dd>
<dt class="col-6 grey">2nd gear:</dt>
<dd class="col-6">3,20:1</dd>
</dl>
</div>
My code:
for item2 in soup2.find_all(attrs={'class':'col-7'}):
jj=item2.text
jj can extract all the value from the website that I scraped, but I only need a few values from it. For example, I only need to extract the value of number of seats and last year of production from GENERAL and the value of 1st gear from TRANSMISSION.
The result should be:
5, available, 5,00:1
The information you need is simply the next item of the titles "number of seats", "last year of production", and "1st gear", so you can loop through the item and next item by using zip
all_items = soup.find_all(attrs={'class':'col-6'})
titles = [
"number of seats",
"last year of production",
"1st gear"
]
d = {title: [] for title in titles}
for item, next_item in zip(all_items, all_items[1:]):
for title in titles:
if title in item.text:
d[title].append(next_item.text)
break
Then d
will contain all the information you need
Change the find_values tuple to get values from html text
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') find_values = ('number of seats', 'last year of production', '1st gear') for i in soup.find_all(attrs={'class': 'row box'}): for j in i.find_all('dt'): text = j.get_text().lower().strip() if text.startswith(find_values): print(text, j.find_next_sibling('dd').get_text())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.