簡體   English   中英

Selenium 將不起作用,除非我實際查看 Web 頁面(也許是 JavaScript 的反爬蟲機制?)

[英]Selenium won't work unless I actually look at the Web page (perhaps anti-crawler mechanism by JavaScript?)

以下代碼僅在我查看 Web 頁面(也就是由 Selenium 操縱的 Chrome 選項卡)時才能正常工作。

有沒有辦法讓它在我瀏覽另一個標簽/窗口時也能正常工作?

(我想知道網站是怎么知道我實際上在看 web 頁面的……)

#This is a job website in Japanese
login_url = "https://mypage.levtech.jp/" 

driver = selenium.webdriver.Chrome("./chromedriver")

#Account and password are required to log in.
#I logged in and got to the following page, which displays a list of companies that I have applied for:
#https://mypage.levtech.jp/recruits/screening

#Dictionary to store company names and their job postings
jobs = {} 


for i, company in enumerate(company_names):    
    time.sleep(1)
   
    element = driver.find_elements_by_class_name("ScreeningRecruits_ListItem")[i]
    while element.text == "": 
    #While loops and time.sleep() are there because the webpage seems to take a while to load
        time.sleep(0.1)
        element = driver.find_elements_by_class_name("ScreeningRecruits_ListItem")[i]
    
    td = element.find_element_by_tag_name("td")
    while td.text == "":
        time.sleep(0.1)
        td = element.find_element_by_tag_name("td")
   
    if td.text == company:
        td.click()
        
        time.sleep(1)
        
        jobs[company] = get_job_desc(driver) #The get_job_desc function checks HTML tags and extract info from certain elements
        
        time.sleep(1)
        
        driver.back()
    
        time.sleep(1)
    
print(jobs)

順便說一下,我已經嘗試添加用戶代理並使用以下代碼向下滾動頁面,希望 Web 頁面相信我正在“查看它”。 好吧,我失敗了:(

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

所以,我認為你的問題的答案是由於window_handles 每當我們打開一個新選項卡時, Selenium改變窗口對我們的關注(很明顯)。 因為焦點在另一個頁面,所以我們需要使用driver.switch_to.window(handle_here)方法。 這樣,我們就可以切換到正確的選項卡。 為此,我找到了一個具有類似功能(也有日語/漢字?)的網站,可能會對您有所幫助。

主要程序 - 供參考

from selenium import webdriver
from selenium.webdriver.chrome.webdriver import WebDriver as ChromeDriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as DriverWait
from selenium.webdriver.support import expected_conditions as DriverConditions
from selenium.common.exceptions import WebDriverException
import time


def get_chrome_driver():
    """This sets up our Chrome Driver and returns it as an object"""
    path_to_chrome = "F:\Selenium_Drivers\Windows_Chrome85_Driver\chromedriver.exe"
    chrome_options = webdriver.ChromeOptions() 
    
    # Browser is displayed in a custom window size
    chrome_options.add_argument("window-size=1500,1000")
    
    return webdriver.Chrome(executable_path = path_to_chrome,
                            options = chrome_options)

    
def wait_displayed(driver : ChromeDriver, xpath: str, int = 5):
    try:
        DriverWait(driver, int).until(
            DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
            )
    except:
        raise WebDriverException(f'Timeout: Failed to find {xpath}')


# Gets our chrome driver and opens our site
chrome_driver = get_chrome_driver()
chrome_driver.get("https://freelance.levtech.jp/project/search/?keyword=&srchbtn=top_search")
wait_displayed(chrome_driver, "//div[@class='l-contentWrap']//ul[@class='asideCta']")
wait_displayed(chrome_driver, "//div[@class='l-main']//ul[@class='prjList']")
wait_displayed(chrome_driver, "//div[@class='l-main']//ul[@class='prjList']//li[contains(@class, 'prjList__item')][1]")

# Click on the first item title link
titleLinkXpath = "(//div[@class='l-main']//ul[@class='prjList']//li[contains(@class, 'prjList__item')][1]//a[contains(@href, '/project/detail/')])[1]"
chrome_driver.find_element(By.XPATH, titleLinkXpath).click()
time.sleep(2)

# Get the currently displayed window handles
tabs_open = chrome_driver.window_handles
if tabs_open.__len__() != 2:
    raise Exception("Failed to click on our Link's Header")
else:
    print(f'You have: {tabs_open.__len__()} tabs open')

# Switch to the 2nd tab and then close it
chrome_driver.switch_to.window(tabs_open[1])
chrome_driver.close()

# Check how many tabs we have open
tabs_open = chrome_driver.window_handles
if tabs_open.__len__() != 1:
    raise Exception("Failed to close our 2nd tab")
else:
    print(f'You have: {tabs_open.__len__()} tabs open')

# Switch back to our main tab
chrome_driver.switch_to.window(tabs_open[0])
chrome_driver.quit()
chrome_driver.service.stop()

對於滾動,您可以使用此方法

def scroll_to_element(driver : ChromeDriver, xpath : str, int = 5):
    try:
        webElement = DriverWait(driver, int).until(
            DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
            )
        driver.execute_script("arguments[0].scrollIntoView();", webElement)
    except:
        raise WebDriverException(f'Timeout: Failed to find element using xpath {xpath}\nResult: Could not scroll')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM