![](/img/trans.png)
[英]IndexError: list index out of range python ( Print )
[英]IndexError: list index out of range when creating a list with variable as number, but works fine in print, why?
Python showed this message while print works but adding the list to the list doesn't: Web scraping a list of names and sites of colleges, I used the regex to separate sites and append the sites in college_site list but the error says: list index即使超出范圍,它也從循環的開頭開始並在循環的結尾結束,程序員? 我在哪里改變?
我的代碼是:
import requests
from bs4 import BeautifulSoup
import json
import re
URL = 'http://doors.stanford.edu/~sr/universities.html'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
college_site = []
def college():
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
site = "\w+\.+\w+\)"
for ol in soup.find_all('ol'):
for num in range(len((ol.get_text()))):
line = ol.get_text().split()
if (re.search(site, line[num])):
college_site.append(line[num])
# works if i put: print(line[num])
with open('E:\Python\mails for college\\test2\sites.json', 'w') as sites:
json.dump(college_site, sites)
if __name__ == '__main__':
college()
要獲取大學和鏈接列表,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup
import json
URL = 'http://doors.stanford.edu/~sr/universities.html'
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
college_sites = []
def college():
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
for li in soup.select('ol li'):
college_name = li.a.get_text(strip=True)
college_link = li.a.find_next_sibling(text=True).strip()
print(college_name, college_link)
college_sites.append((college_name, college_link))
with open('data.json', 'w') as sites:
json.dump(college_sites, sites, indent=4)
if __name__ == '__main__':
college()
印刷:
Abilene Christian University (acu.edu)
Adelphi University (adelphi.edu)
Agnes Scott College (scottlan.edu)
Air Force Institute of Technology (afit.af.mil)
Alabama A&M University (aamu.edu)
Alabama State University (alasu.edu)
Alaska Pacific University
Albertson College of Idaho (acofi.edu)
Albion College (albion.edu)
Alderson-Broaddus College
Alfred University (alfred.edu)
Allegheny College (alleg.edu)
...
並保存data.json
:
[
[
"Abilene Christian University",
"(acu.edu)"
],
[
"Adelphi University",
"(adelphi.edu)"
],
[
"Agnes Scott College",
"(scottlan.edu)"
],
...
問題在於這部分: for num in range(len((ol.get_text())))
。 您想遍歷行,但您的循環正在遍歷每個字符。 修復很簡單。
改變:
for num in range(len((ol.get_text()))):
line = ol.get_text().split()`
至:
line = ol.get_text().split()
for num in range(len(line)):
完整示例:
import requests
from bs4 import BeautifulSoup
import json
import re
URL = 'http://doors.stanford.edu/~sr/universities.html'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
college_site = []
def college():
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
site = "\w+\.+\w+\)"
for ol in soup.find_all('ol'):
line = ol.get_text().split()
for num in range(len(line)):
if (re.search(site, line[num])):
college_site.append(line[num])
with open('E:\Python\mails for college\\test2\sites.json', 'w') as sites:
json.dump(college_site, sites)
if __name__ == '__main__':
college()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.