简体   繁体   English

刮刮时弹出警告信息

[英]pop up warning message when scraping

I am using selenium to scrape this website: https://www.fedsdatacenter.com/federal-pay-rates/index.php?y=all&n=&l=&a=&o= 我正在使用硒来刮擦此网站: https : //www.fedsdatacenter.com/federal-pay-rates/index.php? y = all &n=& l =&a=& o =

My codes work well by keeping clicking next and parsing the table until a warning message comes out: 我的代码可以很好地工作,方法是继续单击下一步并解析表,直到出现警告消息为止:

DataTables warning: table id=table-example - Invalid JSON response. DataTables警告:table id = table-example-无效的JSON响应。

and my code stops because of this error. 由于此错误,我的代码停止了。 Even manually, clicking next gives me the same warning. 即使是手动操作,单击“下一步”也会给我同样的警告。

Here is my code. 这是我的代码。 What can I do about it? 我该怎么办? And if there is any way to improve my code, help me please. 如果有任何改进我的代码的方法,请帮助我。

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import ElementNotVisibleException
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import csv
import time


def has_class_onclick(tag):

    return tag.has_attr('onclick')


def extract_table_content_into_rows(website_lists):

    # This function is to extract all the table content from and put them into a list of row.

    list_of_row = []

    for table_page in website_lists:
        soup_page = BeautifulSoup(table_page, "html.parser")
        soup_table_raw = soup_page.find("table")
        if soup_table_raw:
            soup_table = soup_table_raw.find("tbody")
            for soup_row in soup_table.find_all("tr"):
                row_content = []
                for soup_column in soup_row.find_all("td"):
                    if not soup_column.contents:
                        row_content.append(".")
                    else:
                        column_content = soup_column.contents[0].strip()
                        row_content.append(column_content)
                list_of_row.append(row_content)
        else:
            continue

    return list_of_row


def csv_writer(lists_of_row):

    # This function is to write the table contents into a csv file.

    with open("federal.csv", "at", newline="") as csvfile:
        for row_to_write in lists_of_row:
            writer = csv.writer(csvfile)
            writer.writerow(row_to_write)


driver = webdriver.Chrome('chromedriver')  # Optional argument, if not specified will search path.
driver.get('https://www.fedsdatacenter.com/federal-pay-rates/index.php?y=all&n=&l=&a=&o=')
driver.find_element_by_xpath('//*[@id="table-example_length"]/label/select').click()
time.sleep(3)
driver.find_element_by_xpath('//*[@id="table-example_length"]/label/select/option[4]').click()
time.sleep(3)


page_num = 1

while page_num > 0 and page_num <= 5:
    html = driver.page_source
    website_list = [html]
    row_list = extract_table_content_into_rows(website_list)
    print(row_list)
    csv_writer(row_list)
    driver.find_element_by_xpath('//*[@id="table-example_next"]/a').click()
    time.sleep(3)
    print(page_num)
    page_num += 1

while page_num > 5:
    html = driver.page_source
    website_list = [html]
    row_list = extract_table_content_into_rows(website_list)
    print(row_list)
    csv_writer(row_list)
    driver.find_element_by_xpath('//*[@id="table-example_next"]/a').click()
    not_find = 1
    while not_find == 1:
        try:
            driver.find_element_by_xpath('//*[@id="table-example_paginate"]/ul/li[6]/a')
            while driver.find_element_by_xpath('//*[@id="table-example_pagina'
                                               'te"]/ul/li[6]/a').text != str(page_num + 2):
                time.sleep(0.1)
            not_find = 0
        except StaleElementReferenceException:
            continue
    print(page_num)
    page_num += 1

一种方法是使用某些JavaScript禁用页面上的所有警报:

driver.execute_script('window.alert = function() {};')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM