簡體   English   中英

在導航到下一頁並使用網站上的 selenium 抓取所有數據時出錯?

[英]Getting error while navigating to next pages and scraping all the data using selenium from the website?

嗨,我正在嘗試從https://www.naukri.com/這個 url 中抓取所有教師工作,我想要所有頁面數據,但我只獲得一頁數據並收到此錯誤

Traceback (most recent call last):
  File "naukri.py", line 48, in <module>
    driver.execute_script("arguments.click();", next_page)
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.JavascriptException: Message: javascript error: arguments.click is not a function
  (Session info: chrome=80.0.3987.116)

我寫的代碼是:

import selenium.webdriver

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url ='https://www.naukri.com/'
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")


driver.get(url)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#qsbClick > span.blueBtn'))).click()
driver.find_element_by_xpath('//*[@id="skill"]/div[1]/div[2]/input').send_keys("teacher")
driver.find_element_by_xpath('//*[@id="qsbFormBtn"]').click()

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
for r in result:
    data = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    result = WebDriverWait(data, 10).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for r in result:
        try:
            title=r.find_element_by_class_name("desig").text
            print('title:',title)
            school=r.find_element_by_class_name("org").text
            print('school:',school)
            location=r.find_element_by_class_name("loc").text
            print("location:",location)
            salary=r.find_element_by_class_name("salary").text
            print("salary:",salary)
        except:
            pass
            print('-------')
    next_page = r.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[1]/div[59]/a/button")
    driver.execute_script("arguments.click();", next_page)

請任何人幫助我 在此先感謝!

由於“下一步”按鈕的元素索引從第一頁的 59 變為下一頁的 60,因此您可以找到頁面上所有類為"grayBtn"元素,然后單擊該按鈕的索引 [-1]返回列表,因為這將始終提供下一個按鈕。 我也刪除了一些不必要的代碼部分,例如重復導入以及不必要的按鈕點擊。 我立即定向到包含教師結果列表的頁面,而不是在主頁的搜索字段中輸入“教師”。 我只剩下以下內容:

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re    

Category = input("Category?")
Category = re.sub(" ", "%20", Category)
Type = re.sub(" ", "-", Category.lower())

url ='https://www.naukri.com/' + Type + '-jobs?k=' + Category
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")
driver.get(url)

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
for res in result:
    data = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    jobs = WebDriverWait(data, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for job in jobs:
        try:
            title=job.find_element_by_class_name("desig").text
            print('title:',title)
            school=job.find_element_by_class_name("org").text
            print('school:',school)
            location=job.find_element_by_class_name("loc").text
            print("location:",location)
            salary=job.find_element_by_class_name("salary").text
            print("salary:",salary)
        except:
            pass
            print('-------')
    Button = driver.find_elements_by_class_name("grayBtn")[-1]
    time.sleep(1)
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1300)")
    Button.click()

根據要求,這里是將數據附加到 Pandas 數據幀並將數據幀轉換為 excel 的修改后的代碼:

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re
import pandas as pd

df = pd.DataFrame(columns = ['Title', 'School', 'Location', 'Salary'])

Category = input("Category?")
Category = re.sub(" ", "%20", Category)
Type = re.sub(" ", "-", Category.lower())

url ='https://www.naukri.com/' + Type + '-jobs?k=' + Category
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")
driver.get(url)

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
i = 0
for res in result:
    data = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    jobs = WebDriverWait(data, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for job in jobs:
        try:
            title=job.find_element_by_class_name("desig").text
            print('title:',title)
            school=job.find_element_by_class_name("org").text
            print('school:',school)
            location=job.find_element_by_class_name("loc").text
            print("location:",location)
            salary=job.find_element_by_class_name("salary").text
            print("salary:",salary)
            df.loc[i] = [title, school, location, salary]
            i += 1
        except:
            pass
            print('-------')
    Button = driver.find_elements_by_class_name("grayBtn")[-1]
    time.sleep(1)
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1300)")
    Button.click()
df.to_excel("all_results.xlsx")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM