如何使用Scrapy从表中使用ItemLoader抓取数据？

Question

I am trying to extract data from the website " https://www.brickworkratings.com/CreditRatings.aspx ". 我正在尝试从“ https://www.brickworkratings.com/CreditRatings.aspx ”网站提取数据。 There is a table through which I can easily extract data through Scrapy Shell. 有一张表格，通过它我可以轻松地通过Scrapy Shell提取数据。

I wanted to use ItemLoaders as it is really powerful and gives a cleaner experience. 我想使用ItemLoaders，因为它确实功能强大并且提供了更干净的体验。

Here is my code below. 这是下面的代码。

def start_requests(self):
    yield Request("https://www.brickworkratings.com/CreditRatings.aspx", self.parse_credit_rating_response)

def parse_credit_rating_response(self, response):
    table_rows = response.xpath('//*[@id="ContentPlaceHolder1_gvData"]//tr')
    for table_row in table_rows:
        loader = ItemLoader(SampleItem(), response=response)
        try:
            loader.get_xpath(table_row.xpath("td[1]//a/text()")[0].extract())
            # loader.add_value('company_name', 'test')
        except Exception as e:
            print(e)
        item = loader.load_item()
        print(item)
        yield item

I am getting the error, 我遇到了错误，

"XPath error: Invalid expression in 
                                                                        (Name of the Company)".

I believe my XPath is right but I don't think this is the way to use it. 我相信我的XPath是正确的，但我不认为这是使用它的方式。 How do I use it correctly? 我如何正确使用它？ I need to extract data from the table and wanted to use the more powerful ItemLoaders. 我需要从表中提取数据，并想使用功能更强大的ItemLoaders。

Any help will be appreciated, been stuck on it for a long time. 任何帮助将不胜感激，坚持了很长时间。

Answer 1

You need to specify the initial/parent selector when constructing the loader. 构造加载程序时，需要指定初始/父选择器。 It is then unnecessary to provide the response as well. 这样就不必提供响应了。 And then you need to pass an XPath string to add_xpath instead of using get_xpath . 然后，您需要将XPath 字符串传递给add_xpath而不是使用get_xpath 。 Refer to the documentation . 请参阅文档。

Assuming your XPath is correct, here is an example: 假设您的XPath是正确的，下面是一个示例：

# All added selectors will now be relative to table_row.
loader = ItemLoader(SampleItem(), selector=table_row)
# Just give it the XPath here.
loader.add_xpath("field_name", "td[1]//a/text()")

If you need to do additional processing, look at input/output processors . 如果需要进行其他处理，请查看输入/输出处理器。

如何使用Scrapy从表中使用ItemLoader抓取数据？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-02-27 05:34:51

如何使用Scrapy从表中使用ItemLoader抓取数据？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-02-27 05:34:51

解决方案1
0 已采纳 2019-02-27 05:34:51