使用scrapy：response.xpath（）從HTML表中提取數據不會產生任何結果

Question

我一直在使用scrapy庫在python 3中構建一個web scraper，但遇到了我不明白的問題。 我已經成功使用表上的inspect元素抓取了其他表以獲取xpath變量。 但是，使用此表，我無法弄清楚如何從表中提取數據。 我是HTML的新手，但不是編程的新手，所以如果我不在這里，請給我幫助。

該網頁的示例為： http : //land.elpasoco.com/ResidentialBuilding.aspx?schd=5317443025&bldg=1

檢查頁面並獲取目標表的xpath會生成//*[@id="aspnetForm"]/table/tbody/tr[3]/td[1]/table/tbody/tr[1]/td/table/tbody/tr[3]/td/table

但是，在草率的shell response.xpath(target).extract()使用此方法。xpath response.xpath(target).extract()返回[] 。 嘗試針對任何單個單元格似乎也提供了相同的空結果。 我的預期結果將是一個數據框或字典，其與諸如{'Dwelling Units': 1, 'Year Built': 2010 ... }幫助我確定哪里出了問題或如何格式化數據。不勝感激。 謝謝！

Answer 1

import scrapy


class ResidentialRecordsSpider(scrapy.Spider):
    name = "residential_records"

    start_urls = [
        'http://land.elpasoco.com/ResidentialBuilding.aspx?schd=5317443025&bldg=1',
    ]

    def parse(self, response):
        for record in response.xpath('//table[@width="90%"]//td'):
            key = record.xpath('./strong/text()').extract_first(default='')
            value = record.xpath('./text()').extract_first(default='')

            yield { key: value }

在這里，您只需要執行一些數據清理

使用scrapy：response.xpath（）從HTML表中提取數據不會產生任何結果

問題描述

1 個解決方案

解決方案1
1 已采納 2018-06-07 05:04:11

使用scrapy：response.xpath（）從HTML表中提取數據不會產生任何結果

問題描述

1 個解決方案

解決方案1 1 已采納 2018-06-07 05:04:11

解決方案1
1 已采納 2018-06-07 05:04:11