Creating a processing queue in python

Question

I have an email account set up that triggers a python script whenever it receives an email. The script goes through several functions which can take about 30 seconds and writes an entry into a MYSQL database.

Everything runs smoothly until a second email is sent in less than 30 seconds after the first. The second email is processed correctly, but the first email creates a corrupted entry into the database.

I'm looking to hold the email data,

msg=email.message_from_file(sys.stdin)

in a queue if the script has not finished processing the prior email.

I'm using python 2.5. Can anyone recommend a package/script that would accomplish this?

Answer 1

I would look into http://celeryproject.org/

I'm fairly certain that will meet your needs exactly.

Answer 2

I find this a simple way to avoid running a cronjob while the previous cronjob is still running.

fcntl.lockf(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)

This will raise an IOError that I then handle by having the process kill itself.

See http://docs.python.org/library/fcntl.html#fcntl.lockf for more info.

Anyways you can easily use the same idea to only allow a single job to run at a time, which really isn't the same as a queue (since any process waiting could potentially acquire the lock), but it achieves what you want.

import fcntl
import time
fd = open('lock_file', 'w')
fcntl.lockf(fd, fcntl.LOCK_EX)
# optionally write pid to another file so you have an indicator
# of the currently running process
print 'Hello'
time.sleep(1)

You could also just use http://docs.python.org/dev/library/multiprocessing.html#exchanging-objects-between-processes , which does exactly what you want.

Answer 3

While Celery is a very fine piece of software, using it in this scenario is akin to driving in a nail with a sledgehammer. At a conceptual level, you are looking for a job queue (which is what celery provides) but the e-mail inbox you are using to trigger the script is also a capable job-queue.

The more direct solution is to have the Python worker script poll the mail server itself (using the built in poplib for example) retrieve all new mail every few seconds, then process any new e-mails one at a time. This will serialize the work your script is doing, thereby preventing two copies from running at once.

For example, you would wrap your existing script in a function like this (from the documentation linked above):

import getpass, poplib
from time import sleep

M = poplib.POP3('localhost')
M.user(getpass.getuser())
M.pass_(getpass.getpass())
while True:
    numMessages = len(M.list()[1])
    for i in range(numMessages):
        email = '\n'.join(M.retr(i+1)[1])
        # This is what your script normally does:
        do_work_for_message(email)
    sleep(5)

edit: grammar

Creating a processing queue in python

Question

3 answers

solution1
2 2012-01-21 00:42:59

solution2
2 ACCPTED 2012-01-21 01:28:43

solution3
2 2012-01-21 05:13:21

Creating a processing queue in python

Question

3 answers

solution1 2 2012-01-21 00:42:59

solution2 2 ACCPTED 2012-01-21 01:28:43

solution3 2 2012-01-21 05:13:21

solution1
2 2012-01-21 00:42:59

solution2
2 ACCPTED 2012-01-21 01:28:43

solution3
2 2012-01-21 05:13:21