[英]Extract 2 pieces of information from html in python
我需要幫助弄清楚如何提取Grab和數據b之后的數字。 完整的未修改網頁中有很多<tr>
,我需要使用</a>
之前的“ Need”進行過濾。 我一直在嘗試用漂亮的湯來做,盡管看起來lxml可能會更好。 我可以獲取所有包含需求的<tr>
或僅<a>
< a>...< /a>
行,而不能僅獲取該<a>
行中包含需求的<tr>
。
<tr >
<td>3</td>
<td><a href="/local/app">Leave</a></td><td><a href="https://www.leave.com/" target="_blank">Useless</a></td>
<td class="text-right"> <span class="float2" data-a="24608000.0" data-b="518" data-n="818">Garbage</span></td>
<td class="text-right"> <span class="Float" data-a="3019" data-b="0.0635664" data-n="283">Garbage2</span></td>
<td class="text-right">7.38%</td>
<td class="text-right " >Recently</td>
</tr>
<tr >
<td>4</td>
<td><a href="/local">Grab</a></td><td><a href="https://grab.com" target="_blank">Need</a></td>
<td class="text-right"> <span class="bloat2" data="22435000.0" data-b="512" data-n="74491.2">More junk</span></td>
<td class="text-right"> <span class="bloat" data-a="301.177" data-b="35.848" data-n="0.5848">More junk2</span></td>
<td class="text-right">Some more</td>
<td class="text-right " >Recently</td>
</tr>
謝謝你的幫助!
from bs4 import BeautifulSoup
data = '''<tr>
<td>3</td>
<td><a href="/local/app">Leave</a></td><td><a href="https://www.leave.com/" target="_blank">Useless</a></td>
<td class="text-right"> <span class="float2" data-a="24608000.0" data-b="518" data-n="818">Garbage</span></td>
<td class="text-right"> <span class="Float" data-a="3019" data-b="0.0635664" data-n="283">Garbage2</span></td>
<td class="text-right">7.38%</td>
<td class="text-right " >Recently</td>
</tr>
<tr>
<td>4</td>
<td><a href="/local">Grab</a></td><td><a href="https://grab.com" target="_blank">Need</a></td>
<td class="text-right"> <span class="bloat2" data="22435000.0" data-b="512" data-n="74491.2">More junk</span></td>
<td class="text-right"> <span class="bloat" data-a="301.177" data-b="35.848" data-n="0.5848">More junk2</span></td>
<td class="text-right">Some more</td>
<td class="text-right " >Recently</td>
</tr>
'''
soup = BeautifulSoup(data)
print(soup.findAll('a',{"href":"/local" })[0].text)
for a in soup.findAll('span',{"class":["bloat","bloat2"]}):
print(a['data-b'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.