简体   繁体   English

如何在 EC2 AWS 上为 Python 脚本添加 cronjob/调度程序?

[英]How to add cronjob/scheduler for Python scripts on EC2 AWS?

I have a question regarding one of my React apps that I recently developed.我有一个关于我最近开发的 React 应用程序的问题。 It's basically a landing page, which is using React frontend and Node+Express backend and its scraping data from various pages (scrapers are developed in Python).它基本上是一个登陆页面,它使用 React 前端和 Node+Express 后端以及从各个页面抓取数据(抓取器是用 Python 开发的)。

Right now, the React app itself is hosted in Heroku and the execution of scrapers is working, but it's not scheduled automatically.目前,React 应用程序本身托管在 Heroku 中,并且爬虫的执行正在运行,但它不是自动安排的。 Current setup is the following:当前设置如下:

  • EC2 for Python scrapers EC2 适用于 Python 刮板
  • AWS RDS MYSQL for database where I write the data to from the EC2 scrapers AWS RDS MYSQL 用于数据库,我将数据从 EC2 刮板写入其中

I have created a separate file to execute all the others scrapers.我创建了一个单独的文件来执行所有其他刮板。

main.py主文件

import time
import schedule
import os
from pathlib import Path
print('python script executed')


# make sure, what is the current working directory to add the right paths to scrapers
path = os.getcwd()
print(path)
#exec(open("/home/ec2-user/testing_python/lhvscraper.py").read())


filenames = [
    #output table: fundsdata
    Path("/home/ec2-user/testing_python/lhvscraper.py"),
    Path("/home/ec2-user/testing_python/luminorscrapertest.py"),
    Path("/home/ec2-user/testing_python/sebscraper.py"),
    Path("/home/ec2-user/testing_python/swedscraper.py"),
    Path("/home/ec2-user/testing_python/tulevascraper.py"),

    #output table: feesdata
    Path("/home/ec2-user/testing_python/feesscraper.py"),

    #output table: yield_y1_data
    Path("/home/ec2-user/testing_python/yield_1y_scraper.py"),

    #output table: navdata
    #Path("/home/ec2-user/testing_python/navscraper.py"),
]

def main_scraper_scheduler(): 
    print("scheduler is working")

    for filename in filenames:
        print(filename)
        with open(filename) as infile:
            exec(infile.read())

    time.sleep(11)

schedule.every(10).seconds.do(main_scraper_scheduler)

while True:
    schedule.run_pending()
    time.sleep(1)

I have successfully established a connection between MYSQL and EC2 and tested it on Putty -我已经成功地在 MYSQL 和 EC2 之间建立了连接,并在 Putty 上进行了测试-
which means, if I execute my main.py , all of the scrapers are working, inserting new data to the MYSQL database tables, and then repeat again (see the code above).这意味着,如果我执行我的main.py ,所有刮板都在工作,将新数据插入 MYSQL 数据库表,然后再次重复(参见上面的代码)。 The only thing is that when I close Putty (kill the connection), then the main.py function stops running.唯一的问题是,当我关闭 Putty(终止连接)时,main.py function 停止运行。

So my question is : how to set it up like that, so that main.py file would always keep running (let's say, once a day at 12 PM) without me executing it?所以我的问题是:如何设置它,以便 main.py 文件在我不执行的情况下始终保持运行(比如说,每天中午 12 点运行一次)?

I understand it's about setting up cron job or scheduler (or smth like that), but I didn't manage to set it up right now, so yours' help is very much needed.我知道这是关于设置 cron 作业或调度程序(或类似的东西),但我现在没有设法设置它,所以非常需要你的帮助。

Thanks in advance!提前致谢!

To avoid making crontab files overly long, Linux has canned entries that run things hourly, daily, weekly, or monthly.为了避免使crontab文件过长,Linux 提供了每小时、每天、每周或每月运行的预设条目。 You don't have to modify any crontab to use this.您无需修改任何 crontab 即可使用它。 ANY executable script that is located in /etc/cron.hourly will automatically be run once an hour.位于/etc/cron.hourly中的任何可执行脚本将每小时自动运行一次。 ANY executable script that is located in /etc/cron.daily will automatically be run once per day (usually at 6:30 AM), and so on.位于/etc/cron.daily中的任何可执行脚本将每天自动运行一次(通常在上午 6:30),依此类推。 Just make sure to include a #, line for Python, and chmod +x to make it executable.只需确保包含#、Python 行和chmod +x以使其可执行。 Remember that it will run as root , and you can't necessarily predict which directory it will start in. Make no assumptions.请记住,它将以root身份运行,您不一定能预测它将从哪个目录开始。不要做任何假设。

The alternative is to add a line to your own personal crontab.另一种方法是在您自己的个人 crontab 中添加一行。 You can list your crontab with crontab -l , and you can edit it with crontab -e .您可以使用crontab -l列出您的 crontab,也可以使用crontab -e对其进行编辑。 To run something once a day at noon, you might add:要每天中午运行一次,您可以添加:

0 12 * * *  /home/user/src/my_daily_script.py

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM