簡體   English   中英

使用 BeautifulSoup 從 web 頁面抓取特定鏈接

[英]Scraping a particular link from a web page with BeautifulSoup

我不熟悉從以下頁面抓取並嘗試用漂亮的湯來抓取房地產經紀人數據:“https://www.realtor.com/realestateagents/New-Orleans_LA/pg-1”。

我目前正在使用選擇器返回頁面上每個房地產經紀人的姓名和電話號碼並將它們存儲在字典中。 我還想返回一個 href 值以將他們的個人頁面也存儲在字典中。

看起來jsx-1448471805有多個'a'標簽類,我只需要為每個房地產經紀人返回一個href值。

我正在查看的當前選擇器是:

link_selectors = "#agent_list_wrapper > div.jsx-372421607.cardWrapper > ul > div:nth-child(1) > div > div > div.jsx-1448471805.agent-list-card-img-wrapper.col-lg-2.col-sm-3.col-xxs-4 > a"

但我對此沒有運氣。

我想知道如何找到正確的選擇器來僅提取每個房地產經紀人的 href 值之一以存儲在我當前的字典中,以及如何將其添加到字典“realtors_data”中。

這是我當前的代碼:

from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd

realtors_data = {}
pages = np.arange(1, 2, 1)
print("PAGES: ", pages)
names_selector = "ul > div > div > div > div > div > a > div"
phone_selectors = "ul > div > div > div > div > div > div.jsx-1448471805.agent-phone.hidden-xs.hidden-xxs"
for page in pages:
    page = requests.get("https://www.realtor.com/realestateagents/New-Orleans_LA/pg-" + str(page))
    soup = BeautifulSoup(page.text, 'html.parser')
    names = soup.select(names_selector)
    phones = soup.select(phone_selectors)

    realtors = zip(names, phones)
    for name, phone in realtors:
        realtors_data[name.get_text()] = phone.get_text()


# Printing data
print(realtors_data)

謝謝!

查看 HTML 使用 HTML class 進行導航似乎要簡單得多

from bs4 import BeautifulSoup
import requests
url = "https://www.realtor.com/realestateagents/New-Orleans_LA/pg-1"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
names = []
for m in soup.find_all("div", class_="agent-list-card"):
    names.append({"name":m.find("div", class_="agent-name").text,
                  "phone":m.find("div", class_="agent-phone").text,
                  "link":m.find("div", class_="agent-name").parent["href"]
                 })

names

output

[{'name': 'Cathy Nunez',
  'phone': '(504) 258-5410',
  'link': '/realestateagents/cathy-nunez___3736136_103289755'},
 {'name': 'Olivia Ford',
  'phone': '(504) 343-1837',
  'link': '/realestateagents/olivia-ford_new-orleans_la_1996916_140289755'},
 {'name': 'Michelle Pennino',
  'phone': '(985) 502-1787',
  'link': '/realestateagents/michelle-pennino_mandeville_la_589632_090714455'},
 {'name': 'Lana Hunt',
  'phone': '(225) 933-6459',
  'link': '/realestateagents/lana-hunt_new-orleans_la_2053719_682189755'},
 {'name': 'Nicole Schlaudecker',
  'phone': '(504) 455-0100',
  'link': '/realestateagents/nicole-schlaudecker_metairie_la_1793628_718289755'},
 {'name': 'Jason Minardi',
  'phone': '(985) 645-1275',
  'link': '/realestateagents/jason-minardi_slidell_la_1817940_385614455'},
 {'name': 'John P. Dixon III',
  'phone': '(504) 657-0820',
  'link': '/realestateagents/john-p.-dixon-iii___3088323_713979755'},
 {'name': 'LIZ ASHE',
  'phone': '(504) 401-4285',
  'link': '/realestateagents/liz-ashe_metairie_la_34409_054499755'},
 {'name': "Steven & Heidi Blount/Heidi's Homes, LLC",
  'phone': '(985) 373-6233',
  'link': "/realestateagents/steven-&-heidi-blount-heidi's-homes,-llc_mandeville_la_1369154_537614455"},
 {'name': 'Lisa Julien',
  'phone': '(504) 247-7306',
  'link': '/realestateagents/lisa-julien_new-orleans_la_2203901_038089755'},
 {'name': 'Bonnie Buras Team',
  'phone': '(504) 392-0022',
  'link': '/realestateagents/bonnie-buras-team_belle-chasse_la_18326_371699755'},
 {'name': 'Emily B. Hoskin',
  'phone': '(504) 392-0022',
  'link': '/realestateagents/emily-b.-hoskin_belle-chasse_la_1151586_725289755'},
 {'name': 'Emily Haynie',
  'phone': '(504) 430-6004',
  'link': '/realestateagents/emily-haynie___1055620_198489755'},
 {'name': 'Patrice Milton Poree',
  'phone': '(504) 372-1100',
  'link': '/realestateagents/patrice-milton-poree_new-orleans_la_786531_025589755'},
 {'name': 'Harry VarnadoreTeam',
  'phone': '(504) 450-6916',
  'link': '/realestateagents/harry-varnadore_new-orleans_la_992038_608489755'},
 {'name': 'Leslie Heindel',
  'phone': '(504) 975-4252',
  'link': '/realestateagents/leslie-heindel_new-orleans_la_2152401_967189755'},
 {'name': 'Heather Shields',
  'phone': '(504) 450-9672',
  'link': '/realestateagents/heather-shields_new-orleans_la_3033967_680089755'},
 {'name': 'Brittany Picolo-Ramos',
  'phone': '(504) 300-5179',
  'link': '/realestateagents/brittany-picolo-ramos_metairie_la_1949330_532289755'},
 {'name': 'Brenda Kiefer',
  'phone': '(504) 441-8171',
  'link': '/realestateagents/brenda-kiefer_covington_la_1985750_774389755'},
 {'name': 'Brenda Newfield',
  'phone': '(504) 228-6500',
  'link': '/realestateagents/brenda-newfield_st.-rose_la_1886770_176289755'}]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM