簡體   English   中英

Web用BS4刮

[英]Web scraping with BS4

我在從 imdb.com 中抓取有關電影的一些基本信息時遇到問題。 我希望我的程序從給定的 URL 中獲取電影的標題和描述。 標題部分正在完成它的工作,但是我不知道如何獲取描述。 這是我的代碼:

import requests

from bs4 import BeautifulSoup as bs

def get_data(url):
    r = requests.get(url, headers={'Accept-Language': 'en-US,en;q=0.5'})
    if not r or 'https://www.imdb.com/title' not in url:
        return print('Invalid movie page!')
    return r.content

if __name__ == '__main__':
    # print('Input the URL:')
    # link = input()
    link = 'https://www.imdb.com/title/tt0111161'
    data = get_data(link)
    soup = bs(data, 'html.parser')
    title = ' '.join(soup.find('h1').text.split()[:-1])
    desc = soup.find('p', {'data-testid':"plot", 'class':"GenresAndPlot__Plot-cum89p-8 kmrpno"}).text
    movie_info = {'title': title, 'description': desc}
    print(movie_info)

當我運行它時,我得到一個錯誤:

Exception has occurred: AttributeError
'NoneType' object has no attribute 'text'
  File "movie-scraper.py", line 18, in <module>
    desc = soup.find('p', {'data-testid':"plot", 'class':"GenresAndPlot__Plot-cum89p-8 kmrpno"}).text

如何正確訪問描述?

要獲取 plot 摘要,請將選擇器更改為 find class="plot_summary"

import requests
from bs4 import BeautifulSoup as bs


def get_data(url):
    r = requests.get(url, headers={"Accept-Language": "en-US,en;q=0.5"})
    if not r or "https://www.imdb.com/title" not in url:
        return print("Invalid movie page!")
    return r.content


if __name__ == "__main__":
    link = "https://www.imdb.com/title/tt0111161"
    data = get_data(link)
    soup = bs(data, "html.parser")
    title = " ".join(soup.find("h1").text.split()[:-1])
    desc = soup.find("div", class_="plot_summary").get_text(strip=True)  # <-- change this to find class="plot_summary"
    movie_info = {"title": title, "description": desc}
    print(movie_info)

印刷:

{'title': 'The Shawshank Redemption', 'description': 'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.Director:Frank DarabontWriters:Stephen King(short story "Rita Hayworth and Shawshank Redemption"),Frank Darabont(screenplay)Stars:Tim Robbins,Morgan Freeman,Bob Gunton|See full cast & crew»'}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM