簡體   English   中英

美麗的湯什么也沒回報

[英]Beautiful soup does not return anything

我是Python的新手,我正在用漂亮的湯做有關網絡抓取的作業。 要求用戶輸入課程的單位。 我應該提取有關該課程的相關信息,包括課程名稱,時間,注冊學生和講師。

我從查找包含所有課程信息的課程表開始,每個課程都在課程表標簽后的表格中。 然后,我想遍歷每門課程並找出相關信息。 但是我寫的代碼沒有給我任何東西。

誰能看看我的代碼? 我做錯了哪一部分? 先感謝您。 html鏈接為http://classes.usc.edu/term-20181/classes/itp/ 。以下是我的代碼,要求用戶輸入,我正在嘗試使用find&find_all函數查找類標題,時間,學生注冊和講師。

from bs4 import BeautifulSoup
import urllib.request


url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)

soup=BeautifulSoup(page.read(),"html.parser")

# ask for user input for course units 
choiceUnits=input("Enter")
#trying to find the tag that contain all the courses information
coursesTable=soup.find("div",class_="course-table")  
#trying to find each course table under the course-table tag  
courses=coursesTable.find_all("div",class_="course-info expanded")

for course in courses:
    # trying to find the course units
    unitsTag=courses.find("span",class_="units")
    units=unitsTag.text
    #compare the course units with the user input. If they are the same, find out the course title,time,students registered and instruction 
    if units==choiceUnits:
        #find the title of the course
        titleTag=courses.find("a",class_="courselink")
        title=titleTag.text
        #find the time of the course
        timeTag=courses.find_all("td",class_="time")
        time=timeTag.text
        #find the number of students registered 
        registerTag=courses.find_all("td",class_="registered")
        register=registerTag.text
        #find the instructor 
        instructorTag=courses.find_all("td",class_="instructor")
        instructor=instructorTag.text
        #print out the result to verify 
        print(title)
        print(time,register,instructor)

您的代碼中有幾件事無法正常工作:

在單擊課程之前,用於查找課程表course-info expanded不存在,因此必須使用course-info expandable

其次,您要向用戶輸入單位數量,但是要提取格式為(#.# units)units文本,因此也需要考慮這一點。

最后,在for loop ,您需要訪問course對象的屬性,而不是courses

這給出了您想要的輸出:

from bs4 import BeautifulSoup
import urllib.request


url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page.read(),"html.parser")

# ask for user input
choiceUnits = float(input("Enter number of units:"))
choiceUnits = "(" + str(choiceUnits) + " units)"

#trying to find the tag that contain all the courses information
coursesTable = soup.find("div",class_="course-table")

#trying to find each course table under the course-table tag
courses = coursesTable.find_all("div",class_="course-info expandable") 
for course in courses:
    # trying to find the course units
    units = course.find("span",class_="units").text

    #compare the course units with the user input.
    #If they are the same, find out the course title,time,students registered and instruction
    if units == choiceUnits:
        title = course.find("a",class_="courselink").text 
        print("Course - {}:".format(title))

        for row in course.find("table", class_="sections responsive").find_all("tr")[1:]:
            time = row.find("td",class_="time").text
            register = row.find("td",class_="registered").text
            instructor = row.find("td",class_="instructor").text

            print(time,register,instructor)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM