簡體   English   中英

由於加載延遲,無法使用 selenium 從 IMDB 抓取圖像

[英]Unable to scrape images from IMDB using selenium due to load-late

我正在嘗試從 IMDB 中抓取圖像,但無法獲取它們的 URL。 IMDB 在他們的圖片 URL 中加載較晚,我不知道如何進一步處理。 那么你能幫幫我嗎?

from bs4 import BeautifulSoup
import requests
import urllib
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait



    mimg = []

    imdb_link = "https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&count=250"
    opts = Options()
    opts.add_argument("--headless")
    opts.binary_location = 'C:\Program Files\Google\Chrome\Application\chrome.exe'
    chrome_driver = 'C:\Project\chromedriver.exe'
    driver = webdriver.Chrome(options=opts, executable_path=chrome_driver)
    element = WebDriverWait(driver, 3)
    driver.get(imdb_link)
    time.sleep(2)

    rmsoup = driver.page_source
    time.sleep(2)
    time.sleep(2)
    time.sleep(2)
    relsoup = BeautifulSoup(rmsoup, features='lxml')
    driver.close()

    for img in relsoup.findAll('img'):
        mimg.append(img.get('src'))
    print(mimg)
import time
import selenium
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys # Additional
from bs4 import BeautifulSoup



target_url = "https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&count=250"

c_options = Options()
c_options.add_argument("--start-maximized")
browser = Chrome(executable_path='chromedriver.exe', options=c_options)
browser.get(target_url)
# Scroll down the pages
# This is very bad, crude method, but for now - I didnt wanted to go thr' JavaScript
i = 0
while (i < 101):
    body = browser.find_element_by_css_selector('body')
    body.send_keys(Keys.PAGE_DOWN)
    time.sleep(2)
    i += 1
# Soup Logic    
img_links = []
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
advanced_div = soup.findAll('div', attrs={'class': 'lister-item mode-advanced'})
for div in advanced_div:
    img = div.find('img')
    link = img['src']
    img_links.append(link)
print(img_links)

這對我有用..這對你有幫助嗎?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM