簡體   English   中英

奇怪的beautifulsoup nonetype錯誤

[英]Strange beautifulsoup nonetype error

我做了一個運行良好的洗滌器,以從我的大學獲取所有課程(以便稍后進行過濾),但有時會突然出現奇怪的錯誤,如“ AttributeError:'NoneType'對象沒有屬性'findAll'。 如果我轉到另一長頁,則會出現類似的錯誤。

我的代碼:

from bs4 import BeautifulSoup
import urllib2
import datetime
import httplib
from math import floor
from random import randrange
import cPickle as pickle
[...irrelevant code...]
urls = ["http://locus.vub.ac.be/reporting/spreadsheet?identifier=DA&submit=toon%20de%20gegevens%20-%20show%20the%20teaching%20activities&idtype=name&template=Mod%2bSS&objectclass=module%2bgroup", "http://locus.vub.ac.be/reporting/spreadsheet?identifier=AL+tot+AP&submit=toon+de+gegevens+-+show+the+teaching+activities&idtype=name&template=Mod%2BSS&objectclass=module%2Bgroup"]
for url in urls:
    url = urllib2.urlopen(url).read()
    soup = BeautifulSoup(url)
    begins = soup.findAll("span", {"class" : "label-1-0-0"})
    for begin in begins:
        table = begin.findNext("table", {"class" : "spreadsheet"})
        #if table is not None:
        gegevens = table.findAll("tr")
        for i in range (1, len(gegevens)):
            naam = gegevens[i].td
            dag = naam.find_next_sibling("td")
            beginuur = dag.find_next_sibling("td")
            einduur = beginuur.find_next_sibling("td")
            duur = einduur.find_next_sibling("td")
            weken = duur.find_next_sibling("td")
            titularis = weken.find_next_sibling("td")
            lokaal = titularis.find_next_sibling("td")
            print naam.text + " " + dag.text + " " + beginuur.text + " " + einduur.text + " " + weken.text + " " + titularis.text + " " + lokaal.text

我對鏈接1的輸出:

[...]
Discrete wiskunde (HOC) ma 18:00 21:00 4, 8, 11, 13 CARA PHILIPPE F.4.111
Discrete wiskunde (WPO2) ma 13:00 15:00 3-6, 8, 10-12, 14 Deneckere Tom E.0.12
Discrete wiskunde (HOC) wo 9:00 11:00 2-3, 6, 8-9, 11-14 CARA PHILIPPE E.0.07
Traceback (most recent call last):
  File "Untitled 7.py", line 24, in <module>
    titularis = weken.find_next_sibling("td")
AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

我對鏈接2的輸出:

[...]
Algemeen boekhouden - WPO - TEW - groep 5 (E-M) ma 9:00 11:00 5-6 VANDENHAUTE Marie-Laure D.3.04
Algemeen boekhouden - WPO - HI - groep 1 (A-D) di 14:00 16:00 3-14 VANDENHAUTE Marie-Laure D.2.09
Algemeen boekhouden - WPO - HI - groep 3 (Q-Z) ma 9:00 11:00 3-8, 10-14 CEUSTERMANS Stefanie D.2.10
Algemeen boekhouden - WPO - HI - groep 2 (E-P) di 9:00 11:00 3-8, 10-11, 13-14 VANDENHAUTE Marie-Laure D.3.05
Approaches to language teaching & learning for multilingual education HOC- wo 10:00 12:00 2-9, 11-14 VAN DE CRAEN PIERRE E.3.05
Traceback (most recent call last):
  File "Untitled 7.py", line 16, in <module>
    gegevens = table.findAll("tr")
AttributeError: 'NoneType' object has no attribute 'findAll'

編輯 :用湯= BeautifulSoup(url) soup = BeautifulSoup(url, "xml")替換湯= BeautifulSoup(url) (並導入lxml庫)解決了該問題。 我不知道為什么

似乎來自urllib2.urlopen的錯誤。 您應該確保可以獲取要在服務器上獲取的頁面,或正確處理異常。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM