簡體   English   中英

網頁搜刮-無法使用python和BeautifulSoup打印電話號碼

[英]Web Scraping - unable to print phone numbers using python and BeautifulSoup

嘗試從房地產經紀人頁面中抓取項目數據

我可以同時獲得所有人的姓名和職務說明,但是只有少數電話號碼。

這是我的代碼:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.raywhite.com/contact/?type=People&target=people&suburb=Sydney%2C+NSW+2000&radius=5&firstname=&lastname=&_so=people'

# opening connection
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

    page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div",{"class":"card horizontal-split vcard"})

for container in containers:
    agent_name = container.findAll("li", {"class":"agent-name"})
    name = agent_name[0].text

    agent_role = container.findAll("li", {"class":"agent-role"})
    role = agent_role[0].text

    phone = container.find("a").text

    print("name: " + name)
    print("role: " + role)
    print("phone: " + phone)

這是打印的第一對夫婦的樣本,只有前兩個代理列出了他們的電話號碼:

name: Mark Constantine
role: Principal
phone: 0418 222 643
name: Dawn Veloskey
role: Operations Manager
phone: 0418 449 600
name: Yvonne Lau
role: Sales
phone:

name: Anthony Cavallaro
role: Managing Director | Selling Principal
phone:

name: Ciara OConnor
role: Sales Executive
phone:

name: Michael Buium
role: Commercial Sales Manager and Auctioneer
phone:

name: Albert Hui
role: Senior Commercial Property Manager
phone:

name: Jessie Yee
role: Associate Director, Commercial Leasing & Management
phone:  

不知道為什么不打印其他電話號碼,任何建議將不勝感激。

那是因為前兩個沒有照片,否則照片是第一個“ a”標簽。

更換:

phone = container.find("a").text

與:

 filterfn = lambda x: 'href' in x.attrs and x['href'].startswith("tel")
 phones = map(lambda x: x.text,filter(filterfn,container.findAll("a"))) 

 for phone in phones:
     print("phone number: " + phone)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM