简体   繁体   English

Python:如何中断循环并附加结果的最后一页?

[英]Python: How can I break loop and append the last page of results?

I've made a scraper that works except that it wont scrape the last page. 我制作了一个刮板,该刮板可以刮除最后一页。 The url doesn't change, so I set it up to run on an infinite loop. 网址没有变化,因此我将其设置为在无限循环中运行。

Ive set the loop up to break when it cant click on the next button anymore( on the last page), and it seems that the script is ending before it appends the last past of results. 我已经将循环设置为无法再单击下一个按钮(在最后一页上)时中断,并且似乎脚本在添加结果的最后一部分之前结束了。

How can I append the last page the list? 如何在列表的最后一页附加内容?

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
from time import sleep
import itertools


url = "https://example.com"

driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver")
driver.get(url)

inputElement = driver.find_element_by_id("txtBusinessName")
inputElement.send_keys("ship")

inputElement.send_keys(Keys.ENTER)

df2 = pd.DataFrame()

for i in itertools.count():
    element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.ID, "grid_businessList")))
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    table = soup.find('table', id="grid_businessList")
    rows = table.findAll("tr")

    columns = [v.text.replace('\xa0', ' ') for v in rows[0].find_all('th')]

    df = pd.DataFrame(columns=columns)

    for i in range(1, len(rows)):
        tds = rows[i].find_all('td')

        if len(tds) == 5:
            values = [tds[0].text, tds[1].text, tds[2].text, tds[3].text, tds[4].text, tds[5].text]
        else:
            values = [td.text for td in tds]

        df = df.append(pd.Series(values, index=columns), ignore_index=True)

    try:
        next_button = driver.find_element_by_css_selector("li.next:nth-child(9) > a:nth-child(1)")
        driver.execute_script("arguments[0].click();", next_button)
        sleep(5)

    except NoSuchElementException:
        break

    df2 = df2.append(df)
    df2.to_csv(r'/home/user/Documents/test/' + 'gasostest.csv', index=False)

The problem is that except will break the loop before you have append the last page. 问题在于,except将在您添加最后一页之前中断循环。

What you can do is to use a finally statement in your try - except statement. 您可以做的是在try中使用finally语句-except语句。 The code in the finally block will always run, see https://docs.python.org/3/tutorial/errors.html#defining-clean-up-actions finally块中的代码将始终运行,请参阅https://docs.python.org/3/tutorial/errors.html#defining-clean-up-actions

Your code can be rewritten to this: 您的代码可以重写为:

    try:
        next_button = driver.find_element_by_css_selector("li.next:nth-child(9) > a:nth-child(1)")
        driver.execute_script("arguments[0].click();", next_button)
        sleep(5)

    except NoSuchElementException:
        break

    finally:
        df2 = df2.append(df)
        df2.to_csv(r'/home/user/Documents/test/' + 'gasostest.csv', index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM