[英]Can't find element Beautifulsoup web-scraping londonstockexchange
[英]Can't figure how to web-scraping using beautifulsoup
我正在嘗試從某些網頁上抓取以下信息。 這是完整的代碼:
<tr class="owner">
<td id="P184" class="ownerP" colspan="4">
<ul>
<li><span class="detailType">name:</span><span class="detail">merry/span></li>
<li><a title="sendmessage" class="sendMessageLink" onclick="return openSendMessage('/sendMessage.php',20205" href="" tabindex="0"><span></span>sendmessage</a> <span class="remark_soft">(by pm system)</span></li>
<li><span class="detailType">phone 1</span><a class="detail" href="tel:0387362531">0387362531</a></li>
<li><span class="detailType"></span></li>
</ul>
</td>
</tr>
我只想獲取此信息(電話號碼):
<a class="detail" href="tel:0387362531">0387362531</a>
這是我的代碼,但是不起作用:
for details in soup.find_all(attrs= {"class": "detail"}):
re_res = re.search(r"tel:\('.*?',(\d+)\)", details['href'])
print(re_res)
您非常接近,在這里您可以:
import re
from bs4 import BeautifulSoup
html_doc = """
<tr class="owner"><td id="P184" class="ownerP" colspan="4"><ul>
<li><span class="detailType">name:</span><span class="detail">merry/span></li>
<li><a title="sendmessage" class="sendMessageLink" onclick="return openSendMessage('/sendMessage.php',20205" href="" tabindex="0"><span></span>sendmessage</a> <span class="remark_soft">(by pm system)</span></li><li><span class="detailType">phone 1</span><a class="detail" href="tel:0387362531">0387362531</a></li><li><span class="detailType"></span></li>
</ul></td></tr>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
for details in soup.find_all(attrs= {"class": "detail"}):
if "href" in details.attrs and re.search("^tel:", details.attrs["href"]):
print(details.text)
輸出:
0387362531
我只是在瀏覽您創建的詳細信息列表,如果我發現有一個具有href
且href
以tel:
開頭的列表,然后打印出該值。
您應該將soup.find_all(attrs= {"class": "detail"})
soup.find_all('a', attrs= {"class": "detail"})[0]
,以避免產生span
太details
。
而且您的正則表達式不起作用,這一行應該起作用tel:(\\d+)
。 但是,而不是使用正則表達式為什么不干脆讓a
做標記文字details.text
?
您必須將元素類型a
添加到find_all,並且您的正則表達式tel:\\('.*?',(\\d+)\\)
嘗試匹配href
左括號和右括號\\(
和\\)
。
您可以將正則表達式更新為tel:(\\d+)
以匹配tel:
后跟捕獲組(組1)中的一個或多個數字,可以使用re_res.group(1)
檢索。
例如:
for details in soup.find_all('a', attrs= {"class": "detail"}):
re_res = re.search(r"tel:(\d+)", details['href'])
print(re_res.group(1)) # 0387362531
您無需使用正則表達式即可獲得相同的結果。 在這種情況下,請嘗試以下方法:
from bs4 import BeautifulSoup
html_doc = """
<tr class="owner"><td id="P184" class="ownerP" colspan="4"><ul>
<li><span class="detailType">name:</span><span class="detail">merry/span></li>
<li><a title="sendmessage" class="sendMessageLink" onclick="return openSendMessage('/sendMessage.php',20205" href="" tabindex="0"><span></span>sendmessage</a> <span class="remark_soft">(by pm system)</span></li><li><span class="detailType">phone 1</span><a class="detail" href="tel:0387362531">0387362531</a></li><li><span class="detailType"></span></li>
</ul></td></tr>
"""
使用.select()
:
soup = BeautifulSoup(html_doc, 'html.parser')
for telephone in soup.select("a[href^='tel:']"):
if "detail" in telephone['class']:
print(telephone.text)
或使用.find_all()
:
soup = BeautifulSoup(html_doc, 'html.parser')
for telephone in soup.find_all("a",class_="detail"):
if telephone['href'].startswith('tel:'):
print(telephone.text)
它們都產生相同的輸出:
0387362531
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.