簡體   English   中英

在 Python 和 BeaufifulSoup 中的 div class 下刮取子類

[英]Scrape sub classes under div class in Python and BeaufifulSoup

我正在嘗試 web 從頁面中抓取列出的房屋,包括地址、床和浴室以及價格。 頁面相關信息的html如下所示。

在此處輸入圖像描述

我有以下 python 代碼和 BeautifulSoup。

我首先定義要包含的 class 是bottomV2 ,它應該包含所有必需的信息。

但是, price不是div class 而是div class 下的span class。 床/浴室信息列在三個獨立的div class 中。

我應該如何修改我的代碼以獲取價格和床位和浴室?

import requests
from bs4 import BeautifulSoup
import csv

headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://www.redfin.com/city/2749/VT/Burlington"
r = requests.get(url, headers=headers)

houses = []
soup = BeautifulSoup(r.content, 'html5lib')

homes = soup.findAll('div', attrs={'class': 'bottomV2'})

for row in homes:
    house = {}
    house['url'] = "www.redfin.com"+row.a['href']
    house['address'] = row.a['title'].split(', ')[0]
    house['city'] = row.a['title'].split(', ')[1]
    house['state'] = row.a['title'].split(' ')[-2]
    house['zip_code'] = str(row.a['title'].split(' ')[-1])
    houses.append(house)

該腳本使用zip()內置 function 來獲取統計信息、鏈接和價格。 然后解析統計信息和鏈接文本以獲取更多信息:

import requests
from bs4 import BeautifulSoup


url = 'https://www.redfin.com/city/2749/VT/Burlington'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

print('{:<10}{:<11}{:<20}{:<30}{:<15}{:<10}{:<10}{:<15}{}'.format('Beds', 'Baths', 'Sq.m.', 'Address', 'City', 'State', 'Code', 'Price', 'Link'))
for stats, link, price in zip(soup.select('.HomeStatsV2'),
                              soup.select('.bottomV2 > a'),
                              soup.select('.homecardV2Price')):

    beds, baths, sqm = map(lambda t: t.get_text(strip=True), stats.select('div'))
    address, city, code = link.text.split(',')
    state, code = code.split()
    price = price.text
    link = 'https://www.redfin.com' + link['href']

    print('{:<10}{:<11}{:<20}{:<30}{:<15}{:<10}{:<10}{:<15}{}'.format(beds, baths, sqm, address, city, state, code, price, link) )

印刷:

Beds      Baths      Sq.m.               Address                       City           State     Code      Price          Link
4 Beds    4 Baths    4,970 Sq. Ft.       1 Crescent Beach Dr            Burlington    VT        05401     $1,595,000     https://www.redfin.com/VT/Burlington/1-Crescent-Beach-Dr-05408/home/91176261
2 Beds    1 Bath     864 Sq. Ft.         208 Sandra Cir                 Burlington    VT        05408     $275,000       https://www.redfin.com/VT/Burlington/208-Sandra-Cir-05408/home/91180546
1 Bed     1 Bath     688 Sq. Ft.         91 Hildred Dr                  Burlington    VT        05401     $147,500       https://www.redfin.com/VT/Burlington/91-Hildred-Dr-05401/unit-91/home/91178740
3 Beds    1 Bath     896 Sq. Ft.         77 VENUS Ave                   Burlington    VT        05408     $203,200       https://www.redfin.com/VT/Burlington/77-Venus-Ave-05408/home/91175236
2 Beds    2 Baths    1,423 Sq. Ft.       40 College St Unit 211D        Burlington    VT        05401     $559,900       https://www.redfin.com/VT/Burlington/40-College-St-05401/unit-211D/home/91182093
2 Beds    1 Bath     1,229 Sq. Ft.       191 S Winooski Ave #1          Burlington    VT        05401     $425,000       https://www.redfin.com/VT/Burlington/191-S-Winooski-Ave-05401/unit-1/home/91183775
2 Beds    3 Baths    3,270 Sq. Ft.       15 Eastman Way                 Burlington    VT        05401     $1,950,000     https://www.redfin.com/VT/Burlington/15-Eastman-Way-05401/home/91182784
4 Beds    1.5 Baths  1,520 Sq. Ft.       63 Birch Ct                    Burlington    VT        05408     $399,000       https://www.redfin.com/VT/Burlington/63-Birch-Ct-05408/home/91178132
2 Beds    1.75 Baths 1,680 Sq. Ft.       267 Pearl St Unit A3           Burlington    VT        05401     $365,000       https://www.redfin.com/VT/Burlington/267-Pearl-St-05401/unit-A3/home/91182306
4 Beds    1.75 Baths 1,420 Sq. Ft.       32 Vine St                     Burlington    VT        05408     $429,900       https://www.redfin.com/VT/Burlington/32-Vine-St-05408/home/91181276
4 Beds    4 Baths    4,100 Sq. Ft.       62 Overlake Park               Burlington    VT        05401     $1,545,000     https://www.redfin.com/VT/Burlington/62-Overlake-Park-05401/home/91183380
4 Beds    2.75 Baths 2,008 Sq. Ft.       61 Muirfield Rd                Burlington    VT        05408     $510,000       https://www.redfin.com/VT/Burlington/61-Muirfield-Rd-05408/home/91180706
1 Bed     1 Bath     1,033 Sq. Ft.       40 College St #209             Burlington    VT        05401     $359,500       https://www.redfin.com/VT/Burlington/40-College-St-05401/unit-209/home/171717430
2 Beds    1.5 Baths  690 Sq. Ft.         131 Main St #306               Burlington    VT        05401     $309,000       https://www.redfin.com/VT/Burlington/131-Main-St-05401/unit-306/home/63520198
2 Beds    1 Bath     598 Sq. Ft.         24 Avenue B                    Burlington    VT        05408     $44,900        https://www.redfin.com/VT/Burlington/24-Avenue-B-05408/home/91185051
—Beds     —Baths     —Sq. Ft.            227 S Cove Rd                  Burlington    VT        05401     $300,000       https://www.redfin.com/VT/Burlington/227-S-Cove-Rd-05401/home/171490078
1 Bed     1 Bath     744 Sq. Ft.         131 Main St #504               Burlington    VT        05401     $265,000       https://www.redfin.com/VT/Burlington/131-Main-St-05401/unit-504/home/91183359
4 Beds    2.5 Baths  2,715 Sq. Ft.       75 Brookes Ave                 Burlington    VT        05401     $669,900       https://www.redfin.com/VT/Burlington/75-Brookes-Ave-05401/home/91182294
—Beds     —Baths     10,240 Sq. Ft.      71-73 Elmwood Ave              Burlington    VT        05401     $2,150,000     https://www.redfin.com/VT/Burlington/71-Elmwood-Ave-05401/unit-87-91/home/95080381
—Beds     8 Baths    4,491 Sq. Ft.       57-59 Buell St                 Burlington    VT        05401     $1,325,000     https://www.redfin.com/VT/Burlington/57-Buell-St-05401/home/95080294

如何刮掉a下的一行文本<div>又在 a 下的標簽<div class>標簽</div><div id="text_translate"><pre>&lt;div class="style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR"&gt; &lt;div&gt;Augmentin 625 Duo Tablet&lt;/div&gt;&lt;/div&gt;</pre><p> 我想刮掉“Augmentin 625 Duo Tablet”的文字,但似乎找不到正確的刮法</p><p>我現在使用的代碼是:</p><pre> import requests import bs4 import lxml result=requests.get("https://www.pharmadude.com") #print((type(result))) soup = bs4.BeautifulSoup(result.text,"lxml") #print(soup) scrape=soup.find_all('div', attrs={'class': 'style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR'}) for div in scrape: bar=soup.find_all('div') print(bar.text)</pre></div></div>

[英]How to scrape a line of text which is under a <div> tag which is inturn under a <div class> tag

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何抓取和構造同一 div 但不同子類的價格? 在 div class 下刮取子 div 使用 selenium_python 抓取類下的屬性 使用 python selenium 使用自動生成的類刮取一個 div 如何刮掉a下的一行文本<div>又在 a 下的標簽<div class>標簽</div><div id="text_translate"><pre>&lt;div class="style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR"&gt; &lt;div&gt;Augmentin 625 Duo Tablet&lt;/div&gt;&lt;/div&gt;</pre><p> 我想刮掉“Augmentin 625 Duo Tablet”的文字,但似乎找不到正確的刮法</p><p>我現在使用的代碼是:</p><pre> import requests import bs4 import lxml result=requests.get("https://www.pharmadude.com") #print((type(result))) soup = bs4.BeautifulSoup(result.text,"lxml") #print(soup) scrape=soup.find_all('div', attrs={'class': 'style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR'}) for div in scrape: bar=soup.find_all('div') print(bar.text)</pre></div></div> 如何從 Main div 標簽下的第一個 sub div 標簽中抓取文本 是否可以一次從 div class 中的 div 類中抓取所有文本? 如何使用 BeautifulSoup 從 div 下的多個相同的 class 抓取數據 使用 Python 從 div 類中抓取並返回一個值 無法抓取此特定的div類
 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM