简体   繁体   中英

Procmail filtering by Date: field

I need to move away mails older than given time - let it be 24h = 86400s. I use old good procmail for multiple other purposes on that machine, so I wanted to use is as well for this purpose. It also behaves well under the load (~1 000 000 small automated messages per day).

It took me a while to get to this ugly solution (excerpt from bigger procmailrc file):

  1. Grab Date: field using formail
  2. Grab current date in UNIX format (seconds)
  3. bash convert the mail date to unix format
  4. compare values using bash
  5. return result to procmail using exit code. Together:
MAILDATE_RFC=`formail -zxDate:`
DATE_UNIX=`date "+%s"`

:0
* ? MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"` ; if ( (( ($DATE_UNIX-$MAILDATE_UNIX) > 86400)) ) then exit 0; else exit 1; fi
! account_for_outdated_mails

In this case I need to use the "Date:" field, as this contains the local time at which the mail was generated (it can take multiple days to get to my machine). We are 100% sure that "Date:" field exists and contains RFC-style date (those are automated messages in separated mail network).

My solution looks pretty ugly:

  1. Getting the comparison result from bash using exit codes looks pretty bad. Might be inefficient as well.
  2. I would like to calculate the MAILDATE_RFC still in procmail but it seems I cannot use any variable as the argument to generate another variable:
MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"`

does not work.

The only optimization I am aware of would be to push the whole process of getting MAILDATE_RFC, MAILDATE_UNIX and DATE_UNIX processed in bash script and doing it in one bash session instead of 3.

My question: Is there a better way to do it? Maybe more efficient?

What you say doesn't work actually does. Here's a quick demo.

testing.rc :

DEFAULT=/dev/null
SHELL=/bin/sh
VERBOSE=yes

MAILDATE_RFC=`formail -zxDate:`
MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"`
NOW=`date +%s`

:0
*   86400^0 ^ 
* $ -$NOW^0 ^ 
* $ $MAILDATE_UNIX^0 ^
{ LOG="score: $=
" }

Test run, in a fresh Ubuntu 20.04 Docker image:

tripleee@bash$ procmail -m testing.rc <<\:
> Subject: demo message
> Date: Fri, 10 Jun 2022 06:20:36 +0000
> 
> Try me
> :
procmail: [263] Fri Jun 10 06:21:23 2022
procmail: Executing "formail,-zxDate:"
procmail: [263] Fri Jun 10 06:21:23 2022
procmail: Assigning "MAILDATE_RFC=Fri, 10 Jun 2022 06:20:36 +0000"
procmail: Executing "date,-d,Fri, 10 Jun 2022 06:20:36 +0000,+%s"
procmail: Assigning "MAILDATE_UNIX=1654842036"
procmail: Executing "date,+%s"
procmail: Assigning "NOW=1654842083"
procmail: Score:   86400   86400 "^"
procmail: Score: -1654842083 -1654755683 "^"
procmail: Score: 1654842036   86353 "^"
procmail: Assigning "LOG=score: 86353
"
score: 86353
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
 
  Folder: /dev/null                              68

This also demonstrates how to use scoring to do the calculation. It's perhaps somewhat intimidating, but saves an external process, and so should be more efficient than doing the calculation in Bash.

In some more detail, 123^0 regex says to add 123 to the score just once if the message matches the regex regex (in the recipe above, we use the regex ^ which of course always matches; every message contains a beginning. You could change the 0 to eg 1 to say to add for every match, or etc - see the procmailsc man page for proper documentation). The $ modifier says to expand any variables in the recipe itself.

If you are not using GNU date , you don't have date -d ; in that case, probably refer to your platform's man page for how to calculate a date stamp for an arbitrary date. How to convert date string to epoch timestamp with the OS X BSD `date` command? has a discussion for MacOS, which should also work for any other *BSD platform.

If you really wanted to make this more efficient, and can be sure that the Date: header really always uses the RFC-mandated format, you could even parse the date in Procmail. Something like

:0
* ^Date: [A-Z][a-z][a-z], \/[ 0-9][0-9] [A-Z][a-z][a-z] [0-9][0-9][0-9][0-9]
{
   date=$MATCH
   :0
   * date ?? ^\/[ 0-9][0-9]
   { dd=$MATCH }
   :0
   * date ?? ^[ 0-9][0-9] \/[A-Z][a-z][a-z]
   { mon=$MATCH }
   * date ?? [A-Z][a-z][a-z] \/[0-9][0-9][0-9][0-9]
   { yyyy=$MATCH }
   :0
   * mon ??  1^0 ^Jan
   * mon ??  2^0 ^Feb
   * mon ??  3^0 ^Mar
   * mon ??  4^0 ^Apr
   * mon ??  5^0 ^May
   * mon ??  6^0 ^Jun
   * mon ??  7^0 ^Jul
   * mon ??  8^0 ^Aug
   * mon ??  9^0 ^Sep
   * mon ?? 10^0 ^Oct
   * mon ?? 11^0 ^Nov
   * mon ?? 12^0 ^Dec
   { mm=$= }
}

The \/ token in a regex says to save the matched text after it into the special variable MATCH . We then copy that variable to date and perform additional matching to extract its parts.

Performing the necessary arithmetic to convert this into seconds since January 1, 1970 should be doable at this point, I hope. If you need complete per-day accuracy, you would also need to extract the time and the time zone and adjust to the correct day if it's not in your preferred time zone, or perhaps UTC (that would be +0000 at the very end); but this is just a sketch, anyway, because I think I have a better idea altogether.

Namely, save the messages to the correct folder as they arrive, then just forward or discard or archive older folders when you no longer need them.

MAILDATE_RFC=`formail -czxDate:`
MAILDATE=`date -d "$MAILDATE_RFC" +%F`
:0:
inbox-$MAILDATE

This will save to an mbox file named like inbox-2022-06-10 based on the extracted Date: header. (Again, you could avoid the external processes if you really wanted to squeeze out the last bit of performance, using the date parsing sketch above. And again, if you can't have a message from a different time zone land in the previous or next day's folder, you need to recalculate the date for your time zone.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM