[英]Beautiful soup does not return anything
I am new to Python and I am using beautiful soup to do an assignment about web scraping. 我是Python的新手,我正在用漂亮的汤做有关网络抓取的作业。 The user is asked to input the units of the course. 要求用户输入课程的单位。 And I should pull out the relevant information about the course including the course title, time, registered students and instructor. 我应该提取有关该课程的相关信息,包括课程名称,时间,注册学生和讲师。
I started with finding the course table which contains all courses information and each course is in a table udder the course table tag. 我从查找包含所有课程信息的课程表开始,每个课程都在课程表标签后的表格中。 Then I would like to iterate and loop through each course to find out the information. 然后,我想遍历每门课程并找出相关信息。 But the code I wrote does not give me anything. 但是我写的代码没有给我任何东西。
Could anyone take a look at my code? 谁能看看我的代码? Which part did I do wrong? 我做错了哪一部分? Thank you in advance. 先感谢您。 The html link is http://classes.usc.edu/term-20181/classes/itp/ The following is my code to ask for user input and I am trying to use find & find_all function to find the class title,time,students registered and instructor. html链接为http://classes.usc.edu/term-20181/classes/itp/ 。以下是我的代码,要求用户输入,我正在尝试使用find&find_all函数查找类标题,时间,学生注册和讲师。
from bs4 import BeautifulSoup
import urllib.request
url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page.read(),"html.parser")
# ask for user input for course units
choiceUnits=input("Enter")
#trying to find the tag that contain all the courses information
coursesTable=soup.find("div",class_="course-table")
#trying to find each course table under the course-table tag
courses=coursesTable.find_all("div",class_="course-info expanded")
for course in courses:
# trying to find the course units
unitsTag=courses.find("span",class_="units")
units=unitsTag.text
#compare the course units with the user input. If they are the same, find out the course title,time,students registered and instruction
if units==choiceUnits:
#find the title of the course
titleTag=courses.find("a",class_="courselink")
title=titleTag.text
#find the time of the course
timeTag=courses.find_all("td",class_="time")
time=timeTag.text
#find the number of students registered
registerTag=courses.find_all("td",class_="registered")
register=registerTag.text
#find the instructor
instructorTag=courses.find_all("td",class_="instructor")
instructor=instructorTag.text
#print out the result to verify
print(title)
print(time,register,instructor)
There are several things in your code that are not working: 您的代码中有几件事无法正常工作:
The class you use for finding the courses tables course-info expanded
does not exists until you click in a course, so you have to use course-info expandable
. 在单击课程之前,用于查找课程表course-info expanded
不存在,因此必须使用course-info expandable
。
Second, you are asking the user for an input of the number of units, but you are extracting the units
text which is in a format (#.# units)
, so you need to account for that too. 其次,您要向用户输入单位数量,但是要提取格式为(#.# units)
的units
文本,因此也需要考虑这一点。
Lastly, in your for loop
, you need to access the properties of the course
object, and not courses
. 最后,在for loop
,您需要访问course
对象的属性,而不是courses
。
This gives the output you want: 这给出了您想要的输出:
from bs4 import BeautifulSoup
import urllib.request
url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page.read(),"html.parser")
# ask for user input
choiceUnits = float(input("Enter number of units:"))
choiceUnits = "(" + str(choiceUnits) + " units)"
#trying to find the tag that contain all the courses information
coursesTable = soup.find("div",class_="course-table")
#trying to find each course table under the course-table tag
courses = coursesTable.find_all("div",class_="course-info expandable")
for course in courses:
# trying to find the course units
units = course.find("span",class_="units").text
#compare the course units with the user input.
#If they are the same, find out the course title,time,students registered and instruction
if units == choiceUnits:
title = course.find("a",class_="courselink").text
print("Course - {}:".format(title))
for row in course.find("table", class_="sections responsive").find_all("tr")[1:]:
time = row.find("td",class_="time").text
register = row.find("td",class_="registered").text
instructor = row.find("td",class_="instructor").text
print(time,register,instructor)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.