简体   繁体   中英

How to add cronjob/scheduler for Python scripts on EC2 AWS?

I have a question regarding one of my React apps that I recently developed. It's basically a landing page, which is using React frontend and Node+Express backend and its scraping data from various pages (scrapers are developed in Python).

Right now, the React app itself is hosted in Heroku and the execution of scrapers is working, but it's not scheduled automatically. Current setup is the following:

  • EC2 for Python scrapers
  • AWS RDS MYSQL for database where I write the data to from the EC2 scrapers

I have created a separate file to execute all the others scrapers.

main.py

import time
import schedule
import os
from pathlib import Path
print('python script executed')


# make sure, what is the current working directory to add the right paths to scrapers
path = os.getcwd()
print(path)
#exec(open("/home/ec2-user/testing_python/lhvscraper.py").read())


filenames = [
    #output table: fundsdata
    Path("/home/ec2-user/testing_python/lhvscraper.py"),
    Path("/home/ec2-user/testing_python/luminorscrapertest.py"),
    Path("/home/ec2-user/testing_python/sebscraper.py"),
    Path("/home/ec2-user/testing_python/swedscraper.py"),
    Path("/home/ec2-user/testing_python/tulevascraper.py"),

    #output table: feesdata
    Path("/home/ec2-user/testing_python/feesscraper.py"),

    #output table: yield_y1_data
    Path("/home/ec2-user/testing_python/yield_1y_scraper.py"),

    #output table: navdata
    #Path("/home/ec2-user/testing_python/navscraper.py"),
]

def main_scraper_scheduler(): 
    print("scheduler is working")

    for filename in filenames:
        print(filename)
        with open(filename) as infile:
            exec(infile.read())

    time.sleep(11)

schedule.every(10).seconds.do(main_scraper_scheduler)

while True:
    schedule.run_pending()
    time.sleep(1)

I have successfully established a connection between MYSQL and EC2 and tested it on Putty -
which means, if I execute my main.py , all of the scrapers are working, inserting new data to the MYSQL database tables, and then repeat again (see the code above). The only thing is that when I close Putty (kill the connection), then the main.py function stops running.

So my question is : how to set it up like that, so that main.py file would always keep running (let's say, once a day at 12 PM) without me executing it?

I understand it's about setting up cron job or scheduler (or smth like that), but I didn't manage to set it up right now, so yours' help is very much needed.

Thanks in advance!

To avoid making crontab files overly long, Linux has canned entries that run things hourly, daily, weekly, or monthly. You don't have to modify any crontab to use this. ANY executable script that is located in /etc/cron.hourly will automatically be run once an hour. ANY executable script that is located in /etc/cron.daily will automatically be run once per day (usually at 6:30 AM), and so on. Just make sure to include a #, line for Python, and chmod +x to make it executable. Remember that it will run as root , and you can't necessarily predict which directory it will start in. Make no assumptions.

The alternative is to add a line to your own personal crontab. You can list your crontab with crontab -l , and you can edit it with crontab -e . To run something once a day at noon, you might add:

0 12 * * *  /home/user/src/my_daily_script.py

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM