如何使用美丽的汤为页面抓取数据

Question

These code will give me the address but they not give phone number这些代码会给我地址，但不会给我电话号码

    import requests

    from bs4 import BeautifulSoup
            
    headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
    r=requests.get('https://www.houzz.com/professionals/general-contractor') 
    soup=BeautifulSoup(r.content, 'html.parser')
    tra = soup.find_all('div',class_='hz-pro-search-result__info')
    for pro in tra:
       address=pro.find('span',class_='hz-pro-search-result__location-info__text').text
        try:
           phone=pro.select('div.hz-pro-search-result__right-info__contact-info > span > span').text
        except:
           phone=''
        print(address,phone)

Take the data of Phone Number from this the link of page is https://www.houzz.com/professionals/general-contractor从这个页面的链接中获取Phone Number的数据是https：//www.houzz.com/professionals/general-contractor

<div class="hz-pro-search-result__right-info__contact-info"><span class="hz-pro-search-result__contact-info"><span class="icon-font icon-phone hz-pro-search-result__contact-info__icon" aria-hidden="true"></span>(800) 310-7154</span></div>

<span class="hz-pro-search-result__contact-info"><span class="icon-font icon-phone hz-pro-search-result__contact-info__icon" aria-hidden="true"></span>(800) 310-7154</span>

Answer 1

The website needs you to click on 'View Phone Number' to reveal the phone number.该网站需要您单击“查看电话号码”以显示电话号码。 You can use Selenium to do this.您可以使用 Selenium 来执行此操作。 I am also using pandas but you can skip it if you want.我也在使用 pandas，但如果你愿意，可以跳过它。

Install Selenium:安装 Selenium：

pip install selenium

Install Selenium Driver for your version of Chrome为您的 Chrome 版本安装 Selenium 驱动程序

Install pandas:安装 pandas：

pip install pandas

Here's the code:这是代码：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

# location of chromedriver.exe
driver = webdriver.Chrome("D:/chromedriver/94/chromedriver.exe")

# opening website
driver.get("https://www.houzz.com/professionals/general-contractor")

# waiting for the DIVs to load
WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.XPATH, '//div[@class="hz-pro-search-result__right-info"]')))

# getting all the relevant DIVs
info_divs = driver.find_elements(By.XPATH,  '//div[@class="hz-pro-search-result__right-info"]')

house_details = {
    "address": [],
    "phone": []
}

for row in info_divs:
    try:
        address = row.find_element(By.CLASS_NAME, "hz-pro-search-result__right-info__full-address")
        phone = row.find_element(By.CLASS_NAME, "hz-pro-search-result__right-info__contact-info")
        phone.click()
        time.sleep(0.5)
        house_details['address'].append(address.text)
        house_details['phone'].append(phone.text)
    except Exception as ex:
        print(f"something went wrong. {ex}")
        
# saving to dataframe
house_df = pd.DataFrame.from_dict(house_details)
# exporting to .csv
house_df.to_csv('house_details.csv', index=False)

print(house_df)

如何使用美丽的汤为页面抓取数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-10-07 16:55:22

如何使用美丽的汤为页面抓取数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-10-07 16:55:22

解决方案1
0 已采纳 2021-10-07 16:55:22