簡體   English   中英

如何從中間提取文本<br> BeautifulSoup 中的標簽

[英]How to extract text from between the <br> tags in BeautifulSoup

我想要做的是只從<td>元素中抓取公司名稱,該元素有多個<br>標簽。 僅供參考,一些<td>有一個公司名稱,而另一些有兩個。 請參閱下面<td>元素:

<td id="MainContent_DisassociatedRegistrationsCell" colspan="2">
<p style="background-color:#CCCCCC;width:100%;text-align:center">
<strong>License #: 
<a href="LicenseDetail.aspx?LicNum=332673">332673</a>
</strong>
</p>
BAY AREA REMODELING CO
<br>
5230 EAST 12TH
<br>
OAKLAND, CA 94601
<br>
<strong>Effective Dates:</strong>
09/16/1982 - 06/30/1984
<p style="background-color:#CCCCCC;width:100%;text-align:center">
<strong>License #: 
<a href="LicenseDetail.aspx?LicNum=377133">377133</a>
</strong>
</p>
SAVAGE ROOFING COMPANY
<br>
3055 ALVARADO STREET
<br>
SAN LEANDRO, CA 94577
<br>
<strong>Effective Dates:</strong>
 07/01/1982 - 03/31/1985
</td>

所以從上面的<td>元素,我想要 output:

BAY AREA REMODELING CO
SAVAGE ROOFING COMPANY

找到所需的p標簽后使用next_sibling

前任:

from bs4 import BeautifulSoup

html = """<td id="MainContent_DisassociatedRegistrationsCell" colspan="2">
<p style="background-color:#CCCCCC;width:100%;text-align:center">
<strong>License #: 
<a href="LicenseDetail.aspx?LicNum=332673">332673</a>
</strong>
</p>
BAY AREA REMODELING CO
<br>
5230 EAST 12TH
<br>
OAKLAND, CA 94601
<br>
<strong>Effective Dates:</strong>
09/16/1982 - 06/30/1984
<p style="background-color:#CCCCCC;width:100%;text-align:center">
<strong>License #: 
<a href="LicenseDetail.aspx?LicNum=377133">377133</a>
</strong>
</p>
SAVAGE ROOFING COMPANY
<br>
3055 ALVARADO STREET
<br>
SAN LEANDRO, CA 94577
<br>
<strong>Effective Dates:</strong>
 07/01/1982 - 03/31/1985
</td>"""

soup = BeautifulSoup(html, 'html.parser')
for p in soup.find_all('p'):
    print(p.next_sibling.strip())  

Output:

BAY AREA REMODELING CO
SAVAGE ROOFING COMPANY

使用BeautifulSoup

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html, "html.parser")
>>> [p.next_sibling.strip() for p in soup.findAll("p")]
['BAY AREA REMODELING CO', 'SAVAGE ROOFING COMPANY']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM