[英]How to extract text inside span without class name with BeautifulSoup in python
[英]Python How i can extract data with same class name in BeautifulSoup
我正在尝试使用 python 中的 BeautifulSoup 库提取数据。 我用拉链和汤来提取。
我的 html 数据如下所示:
<li>
<ul class="features">
<li>Year: <strong>2016</strong></li>
<li>Kilometers: <strong>81,000</strong></li>
</ul>
<ul class="features">
<li>Doors: <strong>2 door</strong></li>
<li>Color: <strong>White</strong></li>
</ul>
<ul class="features">
</ul>
</li>
在这里,我想在单独的变量中获得年份、公里数、门数、颜色。 但是当我运行我的代码时,它会聚在一起。
我的代码:
for title, price, date, features in zip(soup.select('.listing-item .title'),
soup.select('.listing-item .price'),
soup.select('.listing-item .date'),
soup.select('.listing-item .features')):
title = title.get_text().strip()
price = price.get_text().strip()
date = date.get_text().strip()
features = features.get_text().strip()
print(features)
输出 :
Year: 2016
Kilometers: 81,000
Doors: 2 door
Color: White
我如何将年份、公里数、门、颜色存储在单独的变量中?
你可以试试:
from bs4 import BeautifulSoup as bs
from io import StringIO
data = """<li>
<ul class="features">
<li>Year: <strong>2016</strong></li>
<li>Kilometers: <strong>81,000</strong></li>
</ul>
<ul class="features">
<li>Doors: <strong>2 door</strong></li>
<li>Color: <strong>White</strong></li>
</ul>
<ul class="features">
</ul>
</li>"""
soup = bs(StringIO(data))
Year, Km, Doors, Color = list(map(lambda x: x.text.split(':')[1].strip(), soup.select('.features > li')))
print(Year, Km, Doors, Color)
找到包含文本的元素li
,然后找到下一个强标签。 声明空列表并追加。
代码。
from bs4 import BeautifulSoup
html='''<li>
<ul class="features">
<li>Year: <strong>2016</strong></li>
<li>Kilometers: <strong>81,000</strong></li>
</ul>
<ul class="features">
<li>Doors: <strong>2 door</strong></li>
<li>Color: <strong>White</strong></li>
</ul>
<ul class="features">
</ul>
</li>
'''
soup=BeautifulSoup(html,'html.parser')
Year=[]
KiloMeter=[]
Doors=[]
Color=[]
for year,km,dor,colr in zip(soup.select('ul.features li:contains("Year:")'),soup.select('ul.features li:contains("Kilometers:")'),soup.select('ul.features li:contains("Doors:")'),soup.select('ul.features li:contains("Color:")')):
Year.append(year.find_next('strong').text)
KiloMeter.append(km.find_next('strong').text)
Doors.append(dor.find_next('strong').text)
Color.append(colr.find_next('strong').text)
print(Year,KiloMeter,Doors,Color)
输出:列表
['2016'] ['81,000'] ['2 door'] ['White']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.