IndexError：創建變量為數字的列表時列表索引超出范圍，但在打印中工作正常，為什么？

Question

Python showed this message while print works but adding the list to the list doesn't: Web scraping a list of names and sites of colleges, I used the regex to separate sites and append the sites in college_site list but the error says: list index即使超出范圍，它也從循環的開頭開始並在循環的結尾結束，程序員？ 我在哪里改變？

我的代碼是：

import requests
from bs4 import BeautifulSoup
import json
import re


URL = 'http://doors.stanford.edu/~sr/universities.html'

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}

college_site = []


def college():
    page = requests.get(URL, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    site = "\w+\.+\w+\)"

    for ol in soup.find_all('ol'):
        for num in range(len((ol.get_text()))):
            line = ol.get_text().split()
            if (re.search(site, line[num])):
                college_site.append(line[num])
# works if i put: print(line[num])


    with open('E:\Python\mails for college\\test2\sites.json', 'w') as sites:
        json.dump(college_site, sites)


if __name__ == '__main__':
    college()

Answer 1

要獲取大學和鏈接列表，您可以使用以下示例：

import requests
from bs4 import BeautifulSoup
import json


URL = 'http://doors.stanford.edu/~sr/universities.html'

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}

college_sites = []

def college():
    page = requests.get(URL, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')

    for li in soup.select('ol li'):
        college_name = li.a.get_text(strip=True)
        college_link = li.a.find_next_sibling(text=True).strip()
        print(college_name, college_link)

        college_sites.append((college_name, college_link))

    with open('data.json', 'w') as sites:
        json.dump(college_sites, sites, indent=4)


if __name__ == '__main__':
    college()

印刷：

Abilene Christian University (acu.edu)
Adelphi University (adelphi.edu)
Agnes Scott College (scottlan.edu)
Air Force Institute of Technology (afit.af.mil)
Alabama A&M University (aamu.edu)
Alabama State University (alasu.edu)
Alaska Pacific University 
Albertson College of Idaho (acofi.edu)
Albion College (albion.edu)
Alderson-Broaddus College 
Alfred University (alfred.edu)
Allegheny College (alleg.edu)

...

並保存data.json ：

[
    [
        "Abilene Christian University",
        "(acu.edu)"
    ],
    [
        "Adelphi University",
        "(adelphi.edu)"
    ],
    [
        "Agnes Scott College",
        "(scottlan.edu)"
    ],

...

Answer 2

問題在於這部分： for num in range(len((ol.get_text()))) 。 您想遍歷行，但您的循環正在遍歷每個字符。 修復很簡單。

改變：

        for num in range(len((ol.get_text()))):
            line = ol.get_text().split()`

至：

        line = ol.get_text().split()
        for num in range(len(line)):

完整示例：

import requests
from bs4 import BeautifulSoup
import json
import re


URL = 'http://doors.stanford.edu/~sr/universities.html'

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}

college_site = []


def college():
    page = requests.get(URL, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    site = "\w+\.+\w+\)"

    for ol in soup.find_all('ol'):
        line = ol.get_text().split()
        for num in range(len(line)):
            if (re.search(site, line[num])):
                college_site.append(line[num])


    with open('E:\Python\mails for college\\test2\sites.json', 'w') as sites:
        json.dump(college_site, sites)


if __name__ == '__main__':
    college()

IndexError：創建變量為數字的列表時列表索引超出范圍，但在打印中工作正常，為什么？

問題描述

2 個解決方案

解決方案1
0 2020-06-03 20:02:22

解決方案2
0 已采納 2020-06-03 20:20:07

IndexError：創建變量為數字的列表時列表索引超出范圍，但在打印中工作正常，為什么？

問題描述

2 個解決方案

解決方案1 0 2020-06-03 20:02:22

解決方案2 0 已采納 2020-06-03 20:20:07

解決方案1
0 2020-06-03 20:02:22

解決方案2
0 已采納 2020-06-03 20:20:07