简体   繁体   中英

Scraping table data using Scrapy (python)

I am working on a project and it involves scraping data from a website using Scrapy. Earlier we were using Selenium but now we have to use Scrapy. I don't have any knowledge on Scrapy but learning it right now. One of the challenges is to scrap the data from a website, the data is structured in tables and though there are links to download such data, it's not working in my case.

Below is the structure of the tables html structure

All my data is under tbody and each having tr

The pseudo code which I have written so far is:

def parse_products(self, response):
    rows=response.xpath('//*[@id="records_table"]/tbody/')
    for i in rows:
      item = table_item()
      item['company'] = i.xpath('td[1]//text()').extract_first()
      item['naic'] = i.xpath('td[2]//text()').extract_first()
      yield item

Am I accessing the table body correctly with the xpath? Not sure if the xpath i specified is correct or not

Better to say:

def parse_products(self, response):
    for row in response.css('table#records_table tr'):
      item = table_item()
      item['company'] = row.xpath('.//td[1]/text()').get()
      item['naic'] = row.xpath('.//td[2]/text()').get()
      yield item

Here you will be iterating by rows of table and then taking data of cells.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM