Scrapy-通過CSS查詢提取特定數據不起作用

Question

我正在嘗試實施一個超級簡單的刮板程序，該刮板程序可以從網站上刮擦公寓價格和平方英尺。 我使用Python + scrapy來實現這一點，但只有一個問題：似乎該部分是所需的信息，作為響應返回時顯示為空，並且其中包含的所有內容（div，span等）也無法得到解決通過CSS查詢。 除了本節中的內容之外，我還可以訪問其他所有內容。

這是網站： https : //www.251brandon.com/floorplans

這是我的初始蜘蛛的外觀（在此示例中，僅查找類“ fp-price”）：

import scrapy

class Brandon251Spider(scrapy.Spider):
name = "Brandon251"

def start_requests(self):
    urls = [
        "https://www.251brandon.com/floorplans"
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
    price = response.css('.fp-price').extract()

    yield {
        'test': price
    }

返回的是一個空的SectorList，而不是所有具有fp-price類的元素。

網站的CSS結構

謝謝你的幫助。 :)

Answer 1

您可以嘗試使用xpath而不是選擇器： response.xpath('//*[@id="floorplan"]/text()')

也可以看看： https : //doc.scrapy.org/en/latest/topics/selectors.html

如果@Casper是正確的，並且特定元素由javascript加載，則應簽出scrapy-splash（ https://github.com/scrapy-plugins/scrapy-splash ），這將使您能夠加載javascript並刮取頁面然后。 祝好運！

Answer 2

@Casper是正確的頁面是使用Java腳本生成的。 如果您嘗試在禁用javascript的情況下在瀏覽器中加載頁面，則該內容將不可見。 但是，當用javascript加載頁面時，所需的數據通常是JSON。 我在網絡響應中搜索sqr ft的值之一，發現數據全部隨頁面加載在名為pageData的變量中。

如果您搜索頁面的源代碼，則會發現定義了JSON對象，並且該頁面的數據已准備好構建頁面。

var pageData = {
  filters: {
    beds: [],
    baths: 0,
    priceRange: {
  low: 0,
  high: 9999
},
sqftRange: {
  low: 0,
  high: 9999
},
availableDate: "all",
amenities: []
  },
  hasImages: true,
  amenities: {
am_0: "Built in USB Ports",
am_1: "Designer Carpeting and Two-Tone Paint",
am_2: "Dishwasher",
am_3: "Double Stainless Steel Sinks",
am_4: "Gas Range",
am_5: "Granite Countertops",
am_6: "Large Patio Or Balcony",
am_7: "Linen Closet",
am_8: "Platinum Silver Kitchen Appliances",
am_9: "Pre-Wired For Technology",
am_10: "Spacious Closets",
am_11: "Stackable Washer/Dryer",
am_12: "Wood Blinds"
  },
  floorplans: [
    {
  id: 2029996,
  name: "1 Bed 1 Bath | 1B",
  amenities: [],
  sqft: 594,
  beds: 1,
  baths: 1.0,
  lowPrice: 2392,
  highPrice: 4208,
  availableCount: 1,
  availableDate: "10/8/2018",
  special: false,
  images: [
    {
      src: "/dmslivecafe/3/234323/1B.png?quality=85",
      alt: "",
      title: "1 Bed 1 Bath | 1B",
      caption: ""
    }
  ],

Scrapy-通過CSS查詢提取特定數據不起作用

問題描述

2 個解決方案

解決方案1
0 已采納 2018-10-02 08:29:30

解決方案2
0 2018-10-02 08:54:19

Scrapy-通過CSS查詢提取特定數據不起作用

問題描述

2 個解決方案

解決方案1 0 已采納 2018-10-02 08:29:30

解決方案2 0 2018-10-02 08:54:19

解決方案1
0 已采納 2018-10-02 08:29:30

解決方案2
0 2018-10-02 08:54:19