[英]Web Scraping - unable to print phone numbers using python and BeautifulSoup
嘗試從房地產經紀人頁面中抓取項目數據
我可以同時獲得所有人的姓名和職務說明,但是只有少數電話號碼。
這是我的代碼:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.raywhite.com/contact/?type=People&target=people&suburb=Sydney%2C+NSW+2000&radius=5&firstname=&lastname=&_so=people'
# opening connection
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div",{"class":"card horizontal-split vcard"})
for container in containers:
agent_name = container.findAll("li", {"class":"agent-name"})
name = agent_name[0].text
agent_role = container.findAll("li", {"class":"agent-role"})
role = agent_role[0].text
phone = container.find("a").text
print("name: " + name)
print("role: " + role)
print("phone: " + phone)
這是打印的第一對夫婦的樣本,只有前兩個代理列出了他們的電話號碼:
name: Mark Constantine
role: Principal
phone: 0418 222 643
name: Dawn Veloskey
role: Operations Manager
phone: 0418 449 600
name: Yvonne Lau
role: Sales
phone:
name: Anthony Cavallaro
role: Managing Director | Selling Principal
phone:
name: Ciara OConnor
role: Sales Executive
phone:
name: Michael Buium
role: Commercial Sales Manager and Auctioneer
phone:
name: Albert Hui
role: Senior Commercial Property Manager
phone:
name: Jessie Yee
role: Associate Director, Commercial Leasing & Management
phone:
不知道為什么不打印其他電話號碼,任何建議將不勝感激。
那是因為前兩個沒有照片,否則照片是第一個“ a”標簽。
更換:
phone = container.find("a").text
與:
filterfn = lambda x: 'href' in x.attrs and x['href'].startswith("tel")
phones = map(lambda x: x.text,filter(filterfn,container.findAll("a")))
for phone in phones:
print("phone number: " + phone)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.