简体   繁体   English

BeautifulSoup:从页面获取一些标签

[英]BeautifulSoup: get some tag from the page

I have html-code 我有html代码

<div class="b-media-cont b-media-cont_relative" data-triggers-container="true"><span class="label">Двигатель:</span> бензин, 1.6 л<br/>
<div class="b-triggers b-triggers_theme_dashed-buttons b-triggers_size_s b-triggers_text-notif"><div class="b-triggers__text">110 л.с.</div><div class="b-triggers__item b-triggers__item_notif" data-target="cost" data-target-container="[data-triggers-container]" data-toggle="tax_dropdown"><div class="b-link b-link_dashed">110 л.с.</div></div><div class="b-triggers-hidden-area b-triggers-hidden-area_width_240 b-triggers-hidden-area_close" data-target-bind="cost" style="left: 0px; top: 39px; width: 241px;">Налог на&nbsp;2016&nbsp;год <b>2&nbsp;750&nbsp;руб.</b><br/><br/><span class="gray">Расчет произведен на легковой автомобиль по <a href="http://law.drom.ru/calc/region77/skoda/rapid/2016/110/">калькулятору транспортного налога</a> для Москвы (<a href="http://www.drom.ru/my_region/">изменить регион</a>).</span></div></div><br/>
<span class="label">Тип кузова:</span> хэтчбек<br/>
<span class="label">Цвет:</span> золотистый<br/>
<span class="label">Пробег:</span> <b>Новый автомобиль от официального дилера</b><br/>
<span class="label">Руль:</span> левый<br/>
<span class="label">VIN:</span> XW8AC1NH7HK****32<br/>
</div><p><span class="label">Данные по модели из каталога:</span> 
<b>толян</b>
<b>4 515 руб.</b>
<b>Продажа Тойота Авенсис.</b>

And I need to get 我需要得到

<b>Новый автомобиль от официального дилера</b>

I try 我尝试

mileages = soup.find_all('span', class_='label').next_subling

But it returns AttributeError: 'ResultSet' object has no attribute 'next_subling' How can I fix that? 但是它返回AttributeError: 'ResultSet' object has no attribute 'next_subling'我该如何解决?

AttributeError: 'ResultSet' object has no attribute 'next_subling' AttributeError:“ ResultSet”对象没有属性“ next_subling”

This is because find_all() returns multiple results - a list of matching tags. 这是因为find_all()返回多个结果-匹配标签的列表。 And, this problem is actually covered by the BeautifulSoup documentation : 而且, BeautifulSoup文档实际上涵盖了此问题:

AttributeError: 'ResultSet' object has no attribute 'foo' - This usually happens because you expected find_all() to return a single tag or string. AttributeError: 'ResultSet' object has no attribute 'foo' find_all()通常发生这种情况是因为您希望find_all()返回单个标记或字符串。 But find_all() returns a list of tags and strings–a ResultSet object. 但是find_all()返回一个标签和字符串列表 -一个ResultSet对象。 You need to iterate over the list and look at the .foo of each one. 您需要遍历该列表,并查看每个列表的.foo。 Or, if you really only want one result, you need to use find() instead of find_all() . 或者,如果您确实只想要一个结果,则需要使用find()而不是find_all()

Instead, you should be using find() to locate a specific label by text and then get the next sibling element: 相反,您应该使用find()通过文本查找特定的label ,然后获取下一个同级元素:

mileages = soup.find('span', text=u'Пробег:').find_next_sibling("b").get_text(strip=True)

This code works for me as is: 这段代码对我来说是这样的:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

data = u"""
<div class="b-media-cont b-media-cont_relative" data-triggers-container="true"><span class="label">Двигатель:</span> бензин, 1.6 л<br/>
<div class="b-triggers b-triggers_theme_dashed-buttons b-triggers_size_s b-triggers_text-notif"><div class="b-triggers__text">110 л.с.</div><div class="b-triggers__item b-triggers__item_notif" data-target="cost" data-target-container="[data-triggers-container]" data-toggle="tax_dropdown"><div class="b-link b-link_dashed">110 л.с.</div></div><div class="b-triggers-hidden-area b-triggers-hidden-area_width_240 b-triggers-hidden-area_close" data-target-bind="cost" style="left: 0px; top: 39px; width: 241px;">Налог на&nbsp;2016&nbsp;год <b>2&nbsp;750&nbsp;руб.</b><br/><br/><span class="gray">Расчет произведен на легковой автомобиль по <a href="http://law.drom.ru/calc/region77/skoda/rapid/2016/110/">калькулятору транспортного налога</a> для Москвы (<a href="http://www.drom.ru/my_region/">изменить регион</a>).</span></div></div><br/>
<span class="label">Тип кузова:</span> хэтчбек<br/>
<span class="label">Цвет:</span> золотистый<br/>
<span class="label">Пробег:</span> <b>Новый автомобиль от официального дилера</b><br/>
<span class="label">Руль:</span> левый<br/>
<span class="label">VIN:</span> XW8AC1NH7HK****32<br/>
</div><p><span class="label">Данные по модели из каталога:</span>
<b>толян</b>
<b>4 515 руб.</b>
<b>Продажа Тойота Авенсис.</b>
</div>
"""
soup = BeautifulSoup(data, "html.parser")

mileages = soup.find('span', text=u'Пробег:').find_next_sibling("b").get_text(strip=True)
print(mileages)

Prints: 印刷品:

Новый автомобиль от официального дилера

Try this code: 试试这个代码:

b = None
spans = soup.find_all("span", {"class":"label"})
for span in spans:
    b = span.find("b")
    if b is not None:
        break

Then you can get access to text of "b" using: 然后,您可以使用以下命令访问“ b”文本:

b.text

the following should work for you 以下应该为您工作

spanTag = soup.find_all("span", string="Пробег:")
print spanTag[0].find_next_sibling("b")
print spanTag[0].find_next_sibling("b").string

result output: 结果输出:

<b>Новый автомобиль от официального дилера</b>
Новый автомобиль от официального дилера

cheers, 干杯,

Dhiraj Dhiraj

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM