简体   繁体   English

美丽的汤,列表索引超出范围

[英]Beautiful soup, list index out of range

I looked at site html source, and found what i need for namePlayer , it was 4 column and 'a' tag.我查看了站点 html 源代码,发现我需要namePlayer ,它是 4 列和“a”标签。 And i tried to find it at answers.append with 'namePlayer': cols[3].a.text我试图在answers.append'namePlayer': cols[3].a.text

But when i complile it, i get IndexError.但是当我编译它时,我得到了 IndexError。 Then i try to change index to 2,3,4,5 but nothing.然后我尝试将索引更改为 2,3,4,5 但没有。

Issue: why i get IndexError: list index out of range, when all is ok(i think:D)问题:为什么我得到 IndexError:列表索引超出范围,当一切正常时(我认为:D)

source:资源:

#!/usr/bin/env python3

import re
import urllib.request
from bs4 import BeautifulSoup

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"


def get_html(url):
    opener = AppURLopener()
    response = opener.open(url)
    return response.read()

def parse(html):
    soup = BeautifulSoup(html)
    table = soup.find(id='answers')

    answers = []

    for row in table.find_all('div')[16:]:
        cols = row.find_all('div')

    answers.append({
        'namePlayer': cols[3].a.text
    })


    for answer in answers:
        print(answers)


def main():
    parse(get_html('http://jaze.ru/forum/topic?id=50&page=1'))

if __name__ == '__main__':
    main()

It does sound like you are providing an index for which a list element does not exist.听起来您正在提供一个列表元素不存在的索引。 Remember index starts at 0. example: 0,1,2,3.记住索引从 0 开始。例如:0,1,2,3。 So if I ask for element 10 I would get an Index error.所以如果我要求元素 10,我会得到一个索引错误。

You are overwriting cols during your loop.您在循环期间覆盖cols The last length of cols is zero hence your error. cols的最后长度为零,因此您的错误。

for row in table.find_all('div')[16:]:
    cols = row.find_all('div')
    print(len(cols))

Run the above and you will see cols ends up at length 0.运行上面的代码,你会看到cols的长度为 0。

This might also occur elsewhere in loop so you should test the length and also decide if your logic needs updating.这也可能发生在循环中的其他地方,因此您应该测试长度并确定您的逻辑是否需要更新。 Also, you need to account for whether there is a child a tag.此外,您需要考虑是否有a标签。

So, you might, for example, do the following (bs4 4.7.1+ required):因此,例如,您可以执行以下操作(需要 bs4 4.7.1+):

answers = []

for row in table.find_all('div')[16:]:
    cols = row.find_all('div:has(>a)')
    if len(cols) >= 3:
         answers.append({
        'namePlayer': cols[3].a.text
    })

Note that answers has been properly indented so you are working with each cols value.请注意, answers已正确缩进,因此您正在使用每个cols值。 This may not fit your exact use case as I am unsure what your desired result is.这可能不适合您的确切用例,因为我不确定您想要的结果是什么。 If you state the desired output I will update accordingly.如果您 state 所需的 output 我将相应更新。


EDIT:编辑:

playerNames球员姓名

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = {i.text.strip() for i in soup.select('[id^=answer_] .left-side a')}

You can preserve order and de-duplicated with OrderedDict (this by @Michael - other solutions in that Q&A)您可以使用 OrderedDict 保留订单并进行重复数据删除( @Michael 提供- 该问答中的其他解决方案)

from bs4 import BeautifulSoup as bs
import requests
from collections import OrderedDict

r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = OrderedDict.fromkeys(names).keys()

why you use for loop for finding all div tag:为什么你使用 for 循环来查找所有 div 标签:

for row in table.find_all('div')[16:]:
        cols = row.find_all('div')

by using this you got all the tag you want通过使用它,你得到了你想要的所有标签

cols = table.find_all('div')[16:]

so just change your code with this code and you got your answer.因此,只需使用此代码更改您的代码,您就会得到答案。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 列表索引超出范围 - 美丽的汤 - list index out of range - beautiful soup 美丽汤的错误:列表索引超出范围 - Error with beautiful soup: list index out of range Beautiful Soup Web Scraper IndexError: list index out of range - Beautiful Soup Web Scraper IndexError: list index out of range Python Beautiful Soup错误:列表索引超出范围 - Python Beautiful Soup Error : list index out of range 为什么我得到“IndexError:列表索引超出范围”,在for循环期间,在美丽的汤解析中途? - Why am I getting “IndexError: list index out of range”, during for loop, midway through beautiful soup parse? 为什么会出现“ IndexError:列表索引超出范围”? (美丽汤) - Why do I get a “IndexError: list index out of range”? (Beautiful Soup) python webscraping期间的索引超出范围错误(美丽的汤) - Index out of range error during python webscraping (beautiful soup) “如何使用Beautiful Soup在嵌套HTML中找到正确的标签,如何接收超出范围错误或空列表的列表索引” - “How to find correct tags in nested HTML using Beautiful Soup, receiving list index out of range error or empty list” 使用汤时列出超出范围的错误。在美丽汤中选择('placeholder')[0] .get_text() - list out of range error when using soup.select('placeholder')[0].get_text() in Beautiful soup 列表索引超出范围错误:使用Beautifoul Soup进行网络抓取 - List index out of range error : webscraping with Beautifoul Soup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM