使用Scrapy（python）收集表数据

Question

我正在研究一个项目，它涉及使用Scrapy从网站抓取数据。 之前我们使用Selenium，但现在我们必须使用Scrapy。 我对Scrapy没有任何了解，但现在就可以学习。 挑战之一是从网站上抓取数据，数据以表格的形式组织，尽管有下载此类数据的链接，但就我而言，它不起作用。

下面是表格的结构html结构

我所有的数据都在tbody下，每个都有tr

到目前为止，我编写的伪代码是：

def parse_products(self, response):
    rows=response.xpath('//*[@id="records_table"]/tbody/')
    for i in rows:
      item = table_item()
      item['company'] = i.xpath('td[1]//text()').extract_first()
      item['naic'] = i.xpath('td[2]//text()').extract_first()
      yield item

我是否可以使用xpath正确访问表主体？ 不知道我指定的xpath是否正确

Answer 1

最好说：

def parse_products(self, response):
    for row in response.css('table#records_table tr'):
      item = table_item()
      item['company'] = row.xpath('.//td[1]/text()').get()
      item['naic'] = row.xpath('.//td[2]/text()').get()
      yield item

在这里，您将按表的行进行迭代，然后获取单元格的数据。

使用Scrapy（python）收集表数据

问题描述

1 个解决方案

解决方案1
0 2018-10-31 15:15:26

使用Scrapy（python）收集表数据

问题描述

1 个解决方案

解决方案1 0 2018-10-31 15:15:26

解决方案1
0 2018-10-31 15:15:26