My situation is like this: I use the Selenium Webdriver to scrape a webpage, first it gets the total_page_items which is the easy part because the page has a number box at the top.
What I want to do know is interact with just 200 of these items each day. Let's say for example the page has 5 million items, how would I go about clicking 200 of these items a day, possible saving the button state to a list, and then the next day continue with the next 200 items? I know about the timing function and how to run the script daily at a certain time, but I don't know how to move from there. Is this a situation where I would use a nested loop?
Here is the for loop that I have so far, I hope it makes sense
daily_items = 200
counter = 0
ButtonXpathList = [
"//div[@id='content']/div/div/div[2]/div/div/ul/li[",
1,
"]/div/div[3]/button [contains(text(), 'Click')]"
]
for i in range(0, daily_items):
ButtonXpathList[1] = ButtonXpathList[1] + (1) #Counts up the string
ButtonXpathString = "".join(str(x) for x in ButtonXpathList)
ButtonElement = WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH, (ButtonXpathString))))
action.move_to_element(ButtonElement)
if "Click" in ButtonXpathString: # and ButtonElement.is_displayed():
ButtonElement.click()
counter += 1
print counter, "New Buttons Clicked"
else:
driver.execute_script("return arguments[0].scrollIntoView();", ButtonElement)
time.sleep(2)
if ButtonXpathList[1] == total_page_items:
print "You're done here"
您可以使用celery创建任务http://www.celeryproject.org/
I suggest you to use APScheduler. I have made something similar, a scraper that needed to run once every morning. APScheduler is simple to use:
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
scheduler.start()
scheduler.add_job(yout_routine, 'interval', days=1)
You can also use hour and minute interval:
scheduler.add_job(yout_routine, 'interval', hours=24)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.