簡體   English   中英

我正在嘗試使用 beautifulsoup4 抓取網站並請求庫

[英]I am trying to scrape a website using beautifulsoup4 and requests library

我想從這個網站中提取電影名稱、年份和長度

下面是代碼:

import requests
from bs4 import BeautifulSoup

URL = 'https://www4.f2movies.to'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

#Trending Movies
Movies = []
Year = []
Length = []

for a in soup.findAll('a', href=True, attrs={'class':"film-detail film-detail-fix"}):
    name=data.find('div', href=True, attrs={'class':'film-name'})
    year=data.find('span', href=True, attrs={'class':'fdi-item'})
    length=data.find('span', href=True, attrs={'class':'fdi-item fdi-duration'})
    Movies.append(name.text)
    Year.append(year.text)
    Length.append(length.text)

print(Movies)
print(Year)
print(Length)

我得到的結果如下所示:

(Projects) anildhage@xxx-MacBook-Air WebScrape % python scrape.py
[]
[]
[]
(Projects) anildhage@xxx-MacBook-Air WebScrape % 

誰能建議我哪里出錯了? TIA

您在使用find()時的某些選擇器是不正確的。 要獲取所有數據,請使用以下示例:

import requests
from bs4 import BeautifulSoup

URL = "https://www4.f2movies.to"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

# Trending Movies
Movies = []
Year = []
Length = []

for data in soup.findAll("div", attrs={"class": "film-detail film-detail-fix"}):
    name = data.find("h3", attrs={"class": "film-name"})
    year = data.find("span", attrs={"class": "fdi-item"})
    length = data.find("span", attrs={"class": "fdi-item fdi-duration"})
    if not length:
        continue

    Movies.append(name.text.strip())
    Year.append(year.text)
    Length.append(length.text)


print(Movies)
print(Year)
print(Length)

Output:

["Tom Clancy's Without Remorse", 'The Mitchells vs. The Machines', 'Mortal Kombat', 'Things Heard & Seen', 'Demon Slayer the Movie: Mugen Train', 'Voyagers', 'Tom & Jerry', 'Godzilla vs. Kong', 'Justice Society: World War II', 'Nomadland', 'The Virtuoso', 'Shadow in the Cloud', 'Nobody', 'Skylines', "Zack Snyder's Justice League", 'Stowaway', '22 vs. Earth', 'The Marksman', 'The Little Things', 'Wonder Woman 1984', 'Raya and the Last Dragon', 'The Father', 'SAS: Red Notice', 'Come True', 'The Lockdown Hauntings', 'The Bike Thief', 'Generation Por Que', 'Adolescents of Chymera', 'The Darkness', 'The Rise of Sir Longbottom', 'Mexican Moon', "She was the Deputy's Wife", '100m Criminal Conviction', 'Percy', 'The Mitchells vs. The Machines', 'Zombie with a Shotgun', 'Things Heard & Seen', 'Golden Arm', 'Bang! Bang!', 'Colors of Love', 'Three Pints and a Rabbi', 'Eat Wheaties!', "Before I'm Dead", '22 vs. Earth', 'The Outside Story', 'Voyagers', 'Ape vs. Monster', 'Pipeline']
['2021', '2021', '2021', '2021', '2020', '2021', '2021', '2021', '2021', '2020', '2021', '2020', '2021', '2020', '2021', '2021', '2021', '2021', '2021', '2020', '2021', '2020', '2021', '2021', '2021', '2020', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2019', '2021', '2021', '2020', '2021', '2021', '2020', '0000', '2021', '2021', '2021', '2021', '2021']
['109m', '113m', '110m', '121m', '117m', '108m', '90m', '113m', 'N/A', '108m', '105m', '83m', '92m', '110m', '242m', '116m', '5m', '108m', '127m', '151m', '112m', '97m', '120m', '105m', '101m', '79m', 'N/A', '81m', 'N/A', '73m', '84m', '95m', '92m', '109m', '113m', '79m', '121m', '90m', '71m', '110m', '85m', 'N/A', '83m', '5m', '85m', '108m', '90m', '85m']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM