I have written a very small program consisting mainly of Scrapy scrapers. I have it packaged in a docker container and need the scrapers to be called by cron.
My docker-compose file is:
version: '2'
services:
admin-panel:
env_file: ./Admin-Panel/.env
build: ./Admin-Panel/
volumes:
- ./Admin-Panel/app:/code/app
- ./Admin-Panel/flaskadmin.py:/code/flaskadmin.py
ports:
- "5000:5000"
scraper:
env_file: ./Admin-Panel/.env
build: ./Scraper/
volumes:
- ./Scraper/spiders:/spiders
My Scraper Dockerfile is:
FROM ubuntu:latest
ENV TERM xterm
RUN apt-get update
RUN apt-get install -y python3-pip python3.5-dev build-essential
RUN apt-get install -y libssl-dev nano cron libpq-dev libffi-dev curl
ADD ./requirements /requirements
ADD crontab /etc/cron.d/scrapers
RUN pip3 install --upgrade pip
RUN pip3 install -r /requirements/base.txt
RUN touch /var/log/cron.log
CMD cron && tail -f /var/log/cron.log
My crontab is (with a trailing new line):
* * * * * root /usr/local/bin/scrapy runspider /spiders/myspider.py
* * * * * root /bin/date >> /tmp/cron_output
This works perfectly well when running locally on my Mac running Sierra but when I put in on Amazon EC2 instance running the Amazon Linux AMI the crons do not get called. I used Filezilla to transfer the files from my Mac to my Amazon EC2 instance.
AWS EC2:
Docker version 1.12.6, build 7392c3b/1.12.6
My MacBook:
Docker version 17.03.0-ce, build 60ccb22
If I add the line
* * * * * root /bin/date >> /tmp/cron_output
using crontab -e nothing happens either. The file cron.log is empty.
UPDATE:
I installed rsyslog and then started it:
service rsyslog start
Now in /var/log/syslog
Mar 25 21:49:01 4406b0e05b9f CRON[464]: Cannot make/remove an entry for the specified session
I finally found a solution thanks to https://github.com/sameersbn/docker-gitlab/issues/173
I commented out the following line in /etc/pam.d/cron
session required pam_loginuid.so
Just need to work out how to do this automatically on docker-compose up.
Try adding permissions to the Dockerfile, eg,
RUN chmod 0744 /spiders/myspider.py /etc/cron.d/scrapercron
and change the location of the crontab
ADD scrapercron /etc/cron.d
Then in your crontab...
HOME=/spiders * * * * * root /spiders/myspider.py >> /tmp/cron_output 2>&1
And to test, try outputting to that tmp file
CMD cron && tail -f /tmp/cron_output
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.