[英]how do I extract with beautiful soup a nested span class value?
我正在努力弄清楚我需要告诉 Beautiful Soup抓取标签“amount”值的元素是什么,在此代码示例中为“1,56”。
我正在粘贴我想要抓取的网页的代码摘录:
<td class="line-content">
<span class="html-tag">
<div
<span class="html-attribute-name">
class
</span>
='
<span class="html-attribute-value">
the-price
</span>
'
<span class="html-attribute-name">
style
</span>
='
<span class="html-attribute-value">
margin-top:20px;
</span>
'>
</span>
</td>
</tr>
<tr>
<td class="line-number" value="447">
</td>
<td class="line-content">
<span class="html-tag">
<span
<span class="html-attribute-name">
class
</span>
='
<span class="html-attribute-value">
currency
</span>
'>
</span>
€
<span class="html-tag">
</span>
</span>
<span class="html-tag">
<span
<span class="html-attribute-name">
class
</span>
='
<span class="html-attribute-value">
amount
</span>
'>
</span>
1,56
<span class="html-tag">
</span>
</span>
</td>
</tr>
你能启发我吗? 我真的很感激任何帮助。
您可以像这样定位数量( data
是您的 HTML 字符串):
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
span_with_amount = soup.find(lambda tag: tag.name == 'span' and tag.get_text(strip=True) == 'amount')
value = span_with_amount.parent.find_next_sibling(text=True)
print(value.strip())
印刷:
1,56
首先,我们将找到带有文本“amount”的<span>
,然后我们将找到此<span>
的父级旁边的文本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.