简体   繁体   中英

web scraping IMDb in python

I'm going through an old Harvard CS 109 class and can't get the ratings from the 250 most voted on movies in the database. I THINK my problem is that there are two td.ratingColumn s, one with the rating and another--right after--that asks you to rate the movie. The 2nd td.ratingColumn contains no </strong> . Would that give me my error? How do I adjust the code to get all of the ratings? 9.2 is 1/250. Thanks.

dom = web.Element(r.text)

for movie in dom.by_tag('td.ratingColumn'): 
    rating = runtime.by_tag('strong')[0].content 
    print rating

9.2
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-ca9164c76716> in <module>()
      2 
      3 for movie in dom.by_tag('td.ratingColumn'):
----> 4     rating = movie.by_tag('strong')[0].content
      5     print rating

IndexError: list index out of range

As you have pointed out, since the second element doesn't contain the tag, an empty array is returned which raises an IndexError exception when trying to access first element.

This should work:

if movie.by_tag('strong'):
    # do stuff

Let me know if I missed out something.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM