[英]Beautiful soup does not return anything
我是Python的新手,我正在用漂亮的湯做有關網絡抓取的作業。 要求用戶輸入課程的單位。 我應該提取有關該課程的相關信息,包括課程名稱,時間,注冊學生和講師。
我從查找包含所有課程信息的課程表開始,每個課程都在課程表標簽后的表格中。 然后,我想遍歷每門課程並找出相關信息。 但是我寫的代碼沒有給我任何東西。
誰能看看我的代碼? 我做錯了哪一部分? 先感謝您。 html鏈接為http://classes.usc.edu/term-20181/classes/itp/ 。以下是我的代碼,要求用戶輸入,我正在嘗試使用find&find_all函數查找類標題,時間,學生注冊和講師。
from bs4 import BeautifulSoup
import urllib.request
url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page.read(),"html.parser")
# ask for user input for course units
choiceUnits=input("Enter")
#trying to find the tag that contain all the courses information
coursesTable=soup.find("div",class_="course-table")
#trying to find each course table under the course-table tag
courses=coursesTable.find_all("div",class_="course-info expanded")
for course in courses:
# trying to find the course units
unitsTag=courses.find("span",class_="units")
units=unitsTag.text
#compare the course units with the user input. If they are the same, find out the course title,time,students registered and instruction
if units==choiceUnits:
#find the title of the course
titleTag=courses.find("a",class_="courselink")
title=titleTag.text
#find the time of the course
timeTag=courses.find_all("td",class_="time")
time=timeTag.text
#find the number of students registered
registerTag=courses.find_all("td",class_="registered")
register=registerTag.text
#find the instructor
instructorTag=courses.find_all("td",class_="instructor")
instructor=instructorTag.text
#print out the result to verify
print(title)
print(time,register,instructor)
您的代碼中有幾件事無法正常工作:
在單擊課程之前,用於查找課程表course-info expanded
不存在,因此必須使用course-info expandable
。
其次,您要向用戶輸入單位數量,但是要提取格式為(#.# units)
的units
文本,因此也需要考慮這一點。
最后,在for loop
,您需要訪問course
對象的屬性,而不是courses
。
這給出了您想要的輸出:
from bs4 import BeautifulSoup
import urllib.request
url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page.read(),"html.parser")
# ask for user input
choiceUnits = float(input("Enter number of units:"))
choiceUnits = "(" + str(choiceUnits) + " units)"
#trying to find the tag that contain all the courses information
coursesTable = soup.find("div",class_="course-table")
#trying to find each course table under the course-table tag
courses = coursesTable.find_all("div",class_="course-info expandable")
for course in courses:
# trying to find the course units
units = course.find("span",class_="units").text
#compare the course units with the user input.
#If they are the same, find out the course title,time,students registered and instruction
if units == choiceUnits:
title = course.find("a",class_="courselink").text
print("Course - {}:".format(title))
for row in course.find("table", class_="sections responsive").find_all("tr")[1:]:
time = row.find("td",class_="time").text
register = row.find("td",class_="registered").text
instructor = row.find("td",class_="instructor").text
print(time,register,instructor)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.