I have a problem with scraping some basic info about movies from imdb.com. I want my program to get title and description of a movie from a given URL. The title part is doing its job, however I can't figure out how to get the description. Here's my code:
import requests
from bs4 import BeautifulSoup as bs
def get_data(url):
r = requests.get(url, headers={'Accept-Language': 'en-US,en;q=0.5'})
if not r or 'https://www.imdb.com/title' not in url:
return print('Invalid movie page!')
return r.content
if __name__ == '__main__':
# print('Input the URL:')
# link = input()
link = 'https://www.imdb.com/title/tt0111161'
data = get_data(link)
soup = bs(data, 'html.parser')
title = ' '.join(soup.find('h1').text.split()[:-1])
desc = soup.find('p', {'data-testid':"plot", 'class':"GenresAndPlot__Plot-cum89p-8 kmrpno"}).text
movie_info = {'title': title, 'description': desc}
print(movie_info)
When I run it I get an error:
Exception has occurred: AttributeError
'NoneType' object has no attribute 'text'
File "movie-scraper.py", line 18, in <module>
desc = soup.find('p', {'data-testid':"plot", 'class':"GenresAndPlot__Plot-cum89p-8 kmrpno"}).text
How do I access the description properly?
To get plot summary, change the selector to find class="plot_summary"
:
import requests
from bs4 import BeautifulSoup as bs
def get_data(url):
r = requests.get(url, headers={"Accept-Language": "en-US,en;q=0.5"})
if not r or "https://www.imdb.com/title" not in url:
return print("Invalid movie page!")
return r.content
if __name__ == "__main__":
link = "https://www.imdb.com/title/tt0111161"
data = get_data(link)
soup = bs(data, "html.parser")
title = " ".join(soup.find("h1").text.split()[:-1])
desc = soup.find("div", class_="plot_summary").get_text(strip=True) # <-- change this to find class="plot_summary"
movie_info = {"title": title, "description": desc}
print(movie_info)
Prints:
{'title': 'The Shawshank Redemption', 'description': 'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.Director:Frank DarabontWriters:Stephen King(short story "Rita Hayworth and Shawshank Redemption"),Frank Darabont(screenplay)Stars:Tim Robbins,Morgan Freeman,Bob Gunton|See full cast & crew»'}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.