简体   繁体   English

如何解决“ IndexError:列表索引超出范围”?

[英]How to fix “IndexError: list index out of range”?

I am scraping a directory with python 3 scrapy. 我正在用python 3 scrapy刮目录。 The data scraped is added in a Mysql database throught pipelines.py 抓取的数据通过pipelines.py添加到Mysql数据库中

I get this error message "IndexError: list index out of range" very often. 我经常收到此错误消息“ IndexError:列表索引超出范围”。

For this question, it happen when I scraped the url of a link. 对于这个问题,当我抓取链接的网址时就会发生。 Sometimes the directory publish the website of the item, sometimes not. 有时目录发布项目的网站,有时不发布。

I didn't find any solutions on stackoverflows. 我在stackoverflows上找不到任何解决方案。 I tried to convert in string but it doesn't work. 我试图将其转换为字符串,但不起作用。

this is the line of code which create this error: 这是产生此错误的代码行:

items['startup_website'] = response.xpath("//div[@class='listing-detail- section-content-wrapper']//a/@href")[0].get() or ''

Does anyone knows how can I fix this error? 有谁知道我该如何解决这个错误?

The indexing is unnecesary; 不需要索引。 you should skip it altogether. 您应该完全跳过它。

.xpath() returns a SelectorList , which has a .get() method of its own. .xpath()返回一个SelectorList ,它具有自己的.get()方法。
Using this will get you the wanted result: 使用它会得到想要的结果:

>>> fetch('http://example.com')
2019-08-14 14:28:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
>>> response.xpath('//a/@href').get('')
'http://www.iana.org/domains/example'
>>> response.xpath('//fake/a/@href').get('')
''

[0] is excessive here. [0]此处过多。 use response.xpath("//selector").get() or '' 使用response.xpath("//selector").get() or ''

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM