[英]stop page loading with Selenium Webdriver
此時,如果網頁中存在大約 5 個不同類型的關鍵字,我的腳本將檢查多個 url。 根據是否找到哪個關鍵字,它將 output “ok”或“no”。
我使用set_page_load_timeout(30)
來避免 url 的無限負載。
問題:一些網頁在超時之前沒有完全加載(即使它是一個“非常”長的超時)。 但我可以在視覺上(沒有無頭)看到頁面已加載。 至少它可以檢查網頁中的關鍵字,但它沒有,並且在超時后,它顯示“失敗”並且說“否”的刮擦不會顯示到最終的 output。
所以我不想在 30 秒后放置一個 except,但我想在 30 秒后停止加載頁面並采取它可以采取的措施。
我的代碼:
# coding=utf-8
import re
sites=[]
keywords_1=[]
keywords_2=[]
keywords_3=[]
keywords_4=[]
keywords_5=[]
import sys
from selenium import webdriver
import csv
import urllib.parse
from datetime import datetime
from datetime import date
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
def reader3(filename):
with open(filename, 'r') as csvfile:
# creating a csv reader object
csvreader = csv.reader(csvfile)
# extracting field names through first row
# extracting each data row one by one
for row in csvreader:
sites.append(str(row[0]).lower())
try:
reader3("data/script/filter_domain_OUTPUT.csv")
except Exception as e:
print(e)
sys.exit()
exc=[]
def reader3(filename):
with open(filename, 'r') as csvfile:
# creating a csv reader object
csvreader = csv.reader(csvfile)
# extracting field names through first row
# extracting each data row one by one
for row in csvreader:
exc.append(str(row[0]).lower())
try:
reader3("data/script/checking_EXCLUDE.csv")
except Exception as e:
print(e)
sys.exit()
def reader2(filename):
with open(filename, 'r') as csvfile:
# creating a csv reader object
csvreader = csv.reader(csvfile)
# extracting field names through first row
# extracting each data row one by one
for row in csvreader:
keywords_1.append(str(row[0]).lower())
keywords_2.append(str(row[1]).lower())
keywords_3.append(str(row[2]).lower())
keywords_4.append(str(row[3]).lower())
keywords_5.append(str(row[4]).lower())
try:
reader2("data/script/checking_KEYWORD.csv")
except Exception as e:
print(e)
sys.exit()
chrome_options = Options()
chrome_options.page_load_strategy = 'none'
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--lang=en')
chrome_options.add_argument('--disable-notifications')
#chrome_options.headless = True
chrome_options.add_argument('start-maximized')
chrome_options.add_argument('enable-automation')
chrome_options.add_argument('--disable-infobars')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-browser-side-navigation')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=chrome_options)
for site in sites:
try:
status_1 = "no"
status_2 = "no"
status_3 = "no"
status_4 = "no"
status_5 = "no"
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
today = date.today()
print("[" + current_time + "] " + str(site))
if 'http' in site:
driver.get(site)
else:
driver.get("http://" + site)
r=str(driver.page_source).lower()
driver.set_page_load_timeout(30)
for keyword_1 in keywords_1:
if keyword_1 in r:
status_1="ok"
print("home -> " +str(keyword_1))
break
for keyword_2 in keywords_2:
if keyword_2 in r:
status_2="ok"
print("home -> " +str(keyword_2))
break
for keyword_3 in keywords_3:
if keyword_3 in r:
status_3="ok"
print("home -> " +str(keyword_3))
break
for keyword_4 in keywords_4:
if keyword_4 in r:
status_4="ok"
print("home -> " +str(keyword_4))
break
for keyword_5 in keywords_5:
if keyword_5 in r:
status_5="ok"
print("Home ->" +str(keyword_5))
break
with open('data/script/checking_OUTPUT.csv', mode='a') as employee_file:
employee_writer = csv.writer(employee_file, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL,lineterminator='\n')
write=[site,status_1,status_2,status_3,status_4,status_5]
employee_writer.writerow(write)
except Exception as e:
#driver.delete_all_cookies()
print("Fail")
driver.quit()
chromeOptions.setPageLoadStrategy(PageLoadStrategy.EAGER);
WebDriver driver = new ChromeDriver(chromeOptions);
使用頁面加載策略 只等到初始 html 加載,您也可以使用 none,但如果出現計時問題,請確保您有顯式/隱式等待元素
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
# caps["pageLoadStrategy"] = "normal" # Waits for full page load
caps["pageLoadStrategy"] = "none"
options = Options()
driver = webdriver.Chrome(desired_capabilities=caps, options=options)
url = 'https://www.gm-trucks.com/'
driver.get(url)
print(driver.title)
print("hi")
input()
或者:
options = Options()
options.set_capability("pageLoadStrategy", "none")
driver = webdriver.Chrome(options=options)
文檔按照 selenium 4.0.0-alpha-7 更新
所以使用上述解決方案或更新到 selenium v4 以備將來保護
pip install selenium==4.0.0.a7
漏洞
https://github.com/SeleniumHQ/seleniumhq.github.io/issues/627
首先,理想情況下set_page_load_timeout()
和page_load_strategy = 'none'
不應該放在一起。
set_page_load_timeout()設置在引發錯誤之前等待頁面加載完成的時間量。
您可以在How to set the timeout of 'driver.get' for python selenium 3.8.0?
page_load_strategy = 'none'
導致Selenium在完全接收到初始頁面內容(已下載 html 內容)后立即返回。
您可以在How to set the timeout of 'driver.get' for python selenium 3.8.0?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.