简体   繁体   English

美丽的汤什么也没回报

[英]Beautiful soup does not return anything

I am new to Python and I am using beautiful soup to do an assignment about web scraping. 我是Python的新手,我正在用漂亮的汤做有关网络抓取的作业。 The user is asked to input the units of the course. 要求用户输入课程的单位。 And I should pull out the relevant information about the course including the course title, time, registered students and instructor. 我应该提取有关该课程的相关信息,包括课程名称,时间,注册学生和讲师。

I started with finding the course table which contains all courses information and each course is in a table udder the course table tag. 我从查找包含所有课程信息的课程表开始,每个课程都在课程表标签后的表格中。 Then I would like to iterate and loop through each course to find out the information. 然后,我想遍历每门课程并找出相关信息。 But the code I wrote does not give me anything. 但是我写的代码没有给我任何东西。

Could anyone take a look at my code? 谁能看看我的代码? Which part did I do wrong? 我做错了哪一部分? Thank you in advance. 先感谢您。 The html link is http://classes.usc.edu/term-20181/classes/itp/ The following is my code to ask for user input and I am trying to use find & find_all function to find the class title,time,students registered and instructor. html链接为http://classes.usc.edu/term-20181/classes/itp/ 。以下是我的代码,要求用户输入,我正在尝试使用find&find_all函数查找类标题,时间,学生注册和讲师。

from bs4 import BeautifulSoup
import urllib.request


url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)

soup=BeautifulSoup(page.read(),"html.parser")

# ask for user input for course units 
choiceUnits=input("Enter")
#trying to find the tag that contain all the courses information
coursesTable=soup.find("div",class_="course-table")  
#trying to find each course table under the course-table tag  
courses=coursesTable.find_all("div",class_="course-info expanded")

for course in courses:
    # trying to find the course units
    unitsTag=courses.find("span",class_="units")
    units=unitsTag.text
    #compare the course units with the user input. If they are the same, find out the course title,time,students registered and instruction 
    if units==choiceUnits:
        #find the title of the course
        titleTag=courses.find("a",class_="courselink")
        title=titleTag.text
        #find the time of the course
        timeTag=courses.find_all("td",class_="time")
        time=timeTag.text
        #find the number of students registered 
        registerTag=courses.find_all("td",class_="registered")
        register=registerTag.text
        #find the instructor 
        instructorTag=courses.find_all("td",class_="instructor")
        instructor=instructorTag.text
        #print out the result to verify 
        print(title)
        print(time,register,instructor)

There are several things in your code that are not working: 您的代码中有几件事无法正常工作:

The class you use for finding the courses tables course-info expanded does not exists until you click in a course, so you have to use course-info expandable . 在单击课程之前,用于查找课程表course-info expanded不存在,因此必须使用course-info expandable

Second, you are asking the user for an input of the number of units, but you are extracting the units text which is in a format (#.# units) , so you need to account for that too. 其次,您要向用户输入单位数量,但是要提取格式为(#.# units)units文本,因此也需要考虑这一点。

Lastly, in your for loop , you need to access the properties of the course object, and not courses . 最后,在for loop ,您需要访问course对象的属性,而不是courses

This gives the output you want: 这给出了您想要的输出:

from bs4 import BeautifulSoup
import urllib.request


url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page.read(),"html.parser")

# ask for user input
choiceUnits = float(input("Enter number of units:"))
choiceUnits = "(" + str(choiceUnits) + " units)"

#trying to find the tag that contain all the courses information
coursesTable = soup.find("div",class_="course-table")

#trying to find each course table under the course-table tag
courses = coursesTable.find_all("div",class_="course-info expandable") 
for course in courses:
    # trying to find the course units
    units = course.find("span",class_="units").text

    #compare the course units with the user input.
    #If they are the same, find out the course title,time,students registered and instruction
    if units == choiceUnits:
        title = course.find("a",class_="courselink").text 
        print("Course - {}:".format(title))

        for row in course.find("table", class_="sections responsive").find_all("tr")[1:]:
            time = row.find("td",class_="time").text
            register = row.find("td",class_="registered").text
            instructor = row.find("td",class_="instructor").text

            print(time,register,instructor)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM