简体   繁体   English

暂停 Python For 循环一天

[英]Pause Python For Loop for a day

My situation is like this: I use the Selenium Webdriver to scrape a webpage, first it gets the total_page_items which is the easy part because the page has a number box at the top.我的情况是这样的:我使用 Selenium Webdriver 抓取网页,首先它获取 total_page_items,这是比较容易的部分,因为页面顶部有一个数字框。

What I want to do know is interact with just 200 of these items each day.我想知道的是每天只与 200 个这样的项目互动。 Let's say for example the page has 5 million items, how would I go about clicking 200 of these items a day, possible saving the button state to a list, and then the next day continue with the next 200 items?例如,该页面有 500 万个项目,我将如何每天点击这些项目中的 200 个,可能将按钮状态保存到列表中,然后第二天继续接下来的 200 个项目? I know about the timing function and how to run the script daily at a certain time, but I don't know how to move from there.我知道计时功能以及如何在特定时间每天运行脚本,但我不知道如何从那里开始。 Is this a situation where I would use a nested loop?这是我会使用嵌套循环的情况吗?

Here is the for loop that I have so far, I hope it makes sense这是我到目前为止的 for 循环,我希望它是有道理的

    daily_items = 200
    counter = 0
    ButtonXpathList = [
          "//div[@id='content']/div/div/div[2]/div/div/ul/li[",
                               1,
                               "]/div/div[3]/button [contains(text(), 'Click')]"
     ]



    for i in range(0, daily_items):

        ButtonXpathList[1]  = ButtonXpathList[1] + (1) #Counts up the string
        ButtonXpathString = "".join(str(x) for x in ButtonXpathList)
        ButtonElement = WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH, (ButtonXpathString))))
        action.move_to_element(ButtonElement)

        if "Click" in ButtonXpathString: # and ButtonElement.is_displayed():
            ButtonElement.click()
            counter += 1
            print counter, "New Buttons Clicked"
        else:
            driver.execute_script("return arguments[0].scrollIntoView();", ButtonElement)
        time.sleep(2)

    if ButtonXpathList[1] == total_page_items:
        print "You're done here"

您可以使用celery创建任务http://www.celeryproject.org/

I suggest you to use APScheduler.我建议你使用 APScheduler。 I have made something similar, a scraper that needed to run once every morning.我做了类似的东西,一个每天早上需要运行一次的刮刀 APScheduler is simple to use: APScheduler 使用简单:

from apscheduler.schedulers.background import BackgroundScheduler    

scheduler = BackgroundScheduler()
scheduler.start()
scheduler.add_job(yout_routine, 'interval', days=1)

You can also use hour and minute interval:您还可以使用小时和分钟间隔:

scheduler.add_job(yout_routine, 'interval', hours=24)

docs: https://apscheduler.readthedocs.org/en/latest/文档: https : //apscheduler.readthedocs.org/en/latest/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM