[英]Beautiful soup, list index out of range
I looked at site html source, and found what i need for namePlayer
, it was 4 column and 'a' tag.我查看了站点 html 源代码,发现我需要namePlayer
,它是 4 列和“a”标签。 And i tried to find it at answers.append
with 'namePlayer': cols[3].a.text
我试图在answers.append
和'namePlayer': cols[3].a.text
But when i complile it, i get IndexError.但是当我编译它时,我得到了 IndexError。 Then i try to change index to 2,3,4,5 but nothing.然后我尝试将索引更改为 2,3,4,5 但没有。
Issue: why i get IndexError: list index out of range, when all is ok(i think:D)问题:为什么我得到 IndexError:列表索引超出范围,当一切正常时(我认为:D)
source:资源:
#!/usr/bin/env python3
import re
import urllib.request
from bs4 import BeautifulSoup
class AppURLopener(urllib.request.FancyURLopener):
version = "Mozilla/5.0"
def get_html(url):
opener = AppURLopener()
response = opener.open(url)
return response.read()
def parse(html):
soup = BeautifulSoup(html)
table = soup.find(id='answers')
answers = []
for row in table.find_all('div')[16:]:
cols = row.find_all('div')
answers.append({
'namePlayer': cols[3].a.text
})
for answer in answers:
print(answers)
def main():
parse(get_html('http://jaze.ru/forum/topic?id=50&page=1'))
if __name__ == '__main__':
main()
It does sound like you are providing an index for which a list element does not exist.听起来您正在提供一个列表元素不存在的索引。 Remember index starts at 0. example: 0,1,2,3.记住索引从 0 开始。例如:0,1,2,3。 So if I ask for element 10 I would get an Index error.所以如果我要求元素 10,我会得到一个索引错误。
You are overwriting cols
during your loop.您在循环期间覆盖cols
。 The last length of cols
is zero hence your error. cols
的最后长度为零,因此您的错误。
for row in table.find_all('div')[16:]:
cols = row.find_all('div')
print(len(cols))
Run the above and you will see cols
ends up at length 0.运行上面的代码,你会看到cols
的长度为 0。
This might also occur elsewhere in loop so you should test the length and also decide if your logic needs updating.这也可能发生在循环中的其他地方,因此您应该测试长度并确定您的逻辑是否需要更新。 Also, you need to account for whether there is a child a
tag.此外,您需要考虑是否有a
标签。
So, you might, for example, do the following (bs4 4.7.1+ required):因此,例如,您可以执行以下操作(需要 bs4 4.7.1+):
answers = []
for row in table.find_all('div')[16:]:
cols = row.find_all('div:has(>a)')
if len(cols) >= 3:
answers.append({
'namePlayer': cols[3].a.text
})
Note that answers
has been properly indented so you are working with each cols
value.请注意, answers
已正确缩进,因此您正在使用每个cols
值。 This may not fit your exact use case as I am unsure what your desired result is.这可能不适合您的确切用例,因为我不确定您想要的结果是什么。 If you state the desired output I will update accordingly.如果您 state 所需的 output 我将相应更新。
EDIT:编辑:
playerNames球员姓名
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = {i.text.strip() for i in soup.select('[id^=answer_] .left-side a')}
You can preserve order and de-duplicated with OrderedDict (this by @Michael - other solutions in that Q&A)您可以使用 OrderedDict 保留订单并进行重复数据删除( @Michael 提供- 该问答中的其他解决方案)
from bs4 import BeautifulSoup as bs
import requests
from collections import OrderedDict
r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = OrderedDict.fromkeys(names).keys()
why you use for loop for finding all div tag:为什么你使用 for 循环来查找所有 div 标签:
for row in table.find_all('div')[16:]:
cols = row.find_all('div')
by using this you got all the tag you want通过使用它,你得到了你想要的所有标签
cols = table.find_all('div')[16:]
so just change your code with this code and you got your answer.因此,只需使用此代码更改您的代码,您就会得到答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.