简体   繁体   English

在导航到下一页并使用网站上的 selenium 抓取所有数据时出错?

[英]Getting error while navigating to next pages and scraping all the data using selenium from the website?

Hi Here I am trying to scrape all the teacher jobs from https://www.naukri.com/ this url I want all the pages data but I am getting only one page data and getting this error嗨,我正在尝试从https://www.naukri.com/这个 url 中抓取所有教师工作,我想要所有页面数据,但我只获得一页数据并收到此错误

Traceback (most recent call last):
  File "naukri.py", line 48, in <module>
    driver.execute_script("arguments.click();", next_page)
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.JavascriptException: Message: javascript error: arguments.click is not a function
  (Session info: chrome=80.0.3987.116)

The code which I wrote is:我写的代码是:

import selenium.webdriver

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url ='https://www.naukri.com/'
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")


driver.get(url)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#qsbClick > span.blueBtn'))).click()
driver.find_element_by_xpath('//*[@id="skill"]/div[1]/div[2]/input').send_keys("teacher")
driver.find_element_by_xpath('//*[@id="qsbFormBtn"]').click()

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
for r in result:
    data = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    result = WebDriverWait(data, 10).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for r in result:
        try:
            title=r.find_element_by_class_name("desig").text
            print('title:',title)
            school=r.find_element_by_class_name("org").text
            print('school:',school)
            location=r.find_element_by_class_name("loc").text
            print("location:",location)
            salary=r.find_element_by_class_name("salary").text
            print("salary:",salary)
        except:
            pass
            print('-------')
    next_page = r.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[1]/div[59]/a/button")
    driver.execute_script("arguments.click();", next_page)

Please help me anyone Thanks in advance!请任何人帮助我 在此先感谢!

Since the element index of the 'next' button changes from 59 in the first page to 60 in the next pages, you can just find all elements on the page which have class "grayBtn" , and click on index [-1] of the list returned, as this will always provide the next button.由于“下一步”按钮的元素索引从第一页的 59 变为下一页的 60,因此您可以找到页面上所有类为"grayBtn"元素,然后单击该按钮的索引 [-1]返回列表,因为这将始终提供下一个按钮。 I removed some unnecessary parts of your code too, like repeated importations as well as unnecessary button clicks.我也删除了一些不必要的代码部分,例如重复导入以及不必要的按钮点击。 I instantly directed to the page containing the list of results for teachers, instead of entering "teacher" into the search field on the home page.我立即定向到包含教师结果列表的页面,而不是在主页的搜索字段中输入“教师”。 I was left with the following:我只剩下以下内容:

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re    

Category = input("Category?")
Category = re.sub(" ", "%20", Category)
Type = re.sub(" ", "-", Category.lower())

url ='https://www.naukri.com/' + Type + '-jobs?k=' + Category
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")
driver.get(url)

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
for res in result:
    data = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    jobs = WebDriverWait(data, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for job in jobs:
        try:
            title=job.find_element_by_class_name("desig").text
            print('title:',title)
            school=job.find_element_by_class_name("org").text
            print('school:',school)
            location=job.find_element_by_class_name("loc").text
            print("location:",location)
            salary=job.find_element_by_class_name("salary").text
            print("salary:",salary)
        except:
            pass
            print('-------')
    Button = driver.find_elements_by_class_name("grayBtn")[-1]
    time.sleep(1)
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1300)")
    Button.click()

As requested, here is the modified code to append data to a pandas dataframe and convert the dataframe to excel:根据要求,这里是将数据附加到 Pandas 数据帧并将数据帧转换为 excel 的修改后的代码:

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re
import pandas as pd

df = pd.DataFrame(columns = ['Title', 'School', 'Location', 'Salary'])

Category = input("Category?")
Category = re.sub(" ", "%20", Category)
Type = re.sub(" ", "-", Category.lower())

url ='https://www.naukri.com/' + Type + '-jobs?k=' + Category
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")
driver.get(url)

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
i = 0
for res in result:
    data = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    jobs = WebDriverWait(data, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for job in jobs:
        try:
            title=job.find_element_by_class_name("desig").text
            print('title:',title)
            school=job.find_element_by_class_name("org").text
            print('school:',school)
            location=job.find_element_by_class_name("loc").text
            print("location:",location)
            salary=job.find_element_by_class_name("salary").text
            print("salary:",salary)
            df.loc[i] = [title, school, location, salary]
            i += 1
        except:
            pass
            print('-------')
    Button = driver.find_elements_by_class_name("grayBtn")[-1]
    time.sleep(1)
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1300)")
    Button.click()
df.to_excel("all_results.xlsx")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM