简体   繁体   English

Procmail 按日期过滤:字段

[英]Procmail filtering by Date: field

I need to move away mails older than given time - let it be 24h = 86400s.我需要移走超过给定时间的邮件 - 让它成为 24 小时 = 86400 秒。 I use old good procmail for multiple other purposes on that machine, so I wanted to use is as well for this purpose.我在那台机器上将旧的好 procmail 用于多个其他目的,所以我也想为此目的使用 is。 It also behaves well under the load (~1 000 000 small automated messages per day).它在负载下也表现良好(每天约 1 000 000 条小型自动消息)。

It took me a while to get to this ugly solution (excerpt from bigger procmailrc file):我花了一段时间才找到这个丑陋的解决方案(摘自更大的 procmailrc 文件):

  1. Grab Date: field using formail抓取日期:使用 formail 的字段
  2. Grab current date in UNIX format (seconds)以 UNIX 格式获取当前日期(秒)
  3. bash convert the mail date to unix format bash 将邮件日期转换为 unix 格式
  4. compare values using bash使用 bash 比较值
  5. return result to procmail using exit code.使用退出代码将结果返回给 procmail。 Together:一起:
MAILDATE_RFC=`formail -zxDate:`
DATE_UNIX=`date "+%s"`

:0
* ? MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"` ; if ( (( ($DATE_UNIX-$MAILDATE_UNIX) > 86400)) ) then exit 0; else exit 1; fi
! account_for_outdated_mails

In this case I need to use the "Date:" field, as this contains the local time at which the mail was generated (it can take multiple days to get to my machine).在这种情况下,我需要使用“日期:”字段,因为它包含生成邮件的本地时间(到达我的机器可能需要几天时间)。 We are 100% sure that "Date:" field exists and contains RFC-style date (those are automated messages in separated mail network).我们 100% 确定“日期:”字段存在并包含 RFC 样式的日期(这些是独立邮件网络中的自动消息)。

My solution looks pretty ugly:我的解决方案看起来很丑:

  1. Getting the comparison result from bash using exit codes looks pretty bad.使用退出代码从 bash 获取比较结果看起来很糟糕。 Might be inefficient as well.也可能效率低下。
  2. I would like to calculate the MAILDATE_RFC still in procmail but it seems I cannot use any variable as the argument to generate another variable:我想计算仍在 procmail 中的 MAILDATE_RFC,但似乎我不能使用任何变量作为参数来生成另一个变量:
MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"`

does not work.不起作用。

The only optimization I am aware of would be to push the whole process of getting MAILDATE_RFC, MAILDATE_UNIX and DATE_UNIX processed in bash script and doing it in one bash session instead of 3.我知道的唯一优化是推动在 bash 脚本中处理 MAILDATE_RFC、MAILDATE_UNIX 和 DATE_UNIX 的整个过程,并在一个 bash 会话而不是 3 个会话中完成。

My question: Is there a better way to do it?我的问题:有更好的方法吗? Maybe more efficient?或许更有效率?

What you say doesn't work actually does.你说的行不通,实际上行不通。 Here's a quick demo.这是一个快速演示。

testing.rc : testing.rc

DEFAULT=/dev/null
SHELL=/bin/sh
VERBOSE=yes

MAILDATE_RFC=`formail -zxDate:`
MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"`
NOW=`date +%s`

:0
*   86400^0 ^ 
* $ -$NOW^0 ^ 
* $ $MAILDATE_UNIX^0 ^
{ LOG="score: $=
" }

Test run, in a fresh Ubuntu 20.04 Docker image:在新的 Ubuntu 20.04 Docker 映像中测试运行:

tripleee@bash$ procmail -m testing.rc <<\:
> Subject: demo message
> Date: Fri, 10 Jun 2022 06:20:36 +0000
> 
> Try me
> :
procmail: [263] Fri Jun 10 06:21:23 2022
procmail: Executing "formail,-zxDate:"
procmail: [263] Fri Jun 10 06:21:23 2022
procmail: Assigning "MAILDATE_RFC=Fri, 10 Jun 2022 06:20:36 +0000"
procmail: Executing "date,-d,Fri, 10 Jun 2022 06:20:36 +0000,+%s"
procmail: Assigning "MAILDATE_UNIX=1654842036"
procmail: Executing "date,+%s"
procmail: Assigning "NOW=1654842083"
procmail: Score:   86400   86400 "^"
procmail: Score: -1654842083 -1654755683 "^"
procmail: Score: 1654842036   86353 "^"
procmail: Assigning "LOG=score: 86353
"
score: 86353
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
 
  Folder: /dev/null                              68

This also demonstrates how to use scoring to do the calculation.这也演示了如何使用评分来进行计算。 It's perhaps somewhat intimidating, but saves an external process, and so should be more efficient than doing the calculation in Bash.这可能有点吓人,但节省了一个外部过程,因此应该比在 Bash 中进行计算更有效率。

In some more detail, 123^0 regex says to add 123 to the score just once if the message matches the regex regex (in the recipe above, we use the regex ^ which of course always matches; every message contains a beginning. You could change the 0 to eg 1 to say to add for every match, or etc - see the procmailsc man page for proper documentation).更详细地说, 123^0 regex表示如果消息匹配正则表达式regex ,则将123添加到分数(在上面的配方中,我们使用正则表达式^当然总是匹配;每条消息都包含一个开头。你可以将 0 更改为例如 1 表示要为每个匹配项添加,等等 - 请参阅procmailsc手册页以获取正确的文档)。 The $ modifier says to expand any variables in the recipe itself. $修饰符表示扩展配方本身中的任何变量。

If you are not using GNU date , you don't have date -d ;如果您不使用 GNU date ,则没有date -d in that case, probably refer to your platform's man page for how to calculate a date stamp for an arbitrary date.在这种情况下,可能请参阅您平台的手册页以了解如何计算任意日期的日期戳。 How to convert date string to epoch timestamp with the OS X BSD `date` command? 如何使用 OS X BSD `date` 命令将日期字符串转换为纪元时间戳? has a discussion for MacOS, which should also work for any other *BSD platform.讨论了 MacOS,它也应该适用于任何其他 *BSD 平台。

If you really wanted to make this more efficient, and can be sure that the Date: header really always uses the RFC-mandated format, you could even parse the date in Procmail.如果您真的想提高效率,并且可以确定Date:标头确实始终使用 RFC 强制格式,您甚至可以在 Procmail 中解析日期。 Something like就像是

:0
* ^Date: [A-Z][a-z][a-z], \/[ 0-9][0-9] [A-Z][a-z][a-z] [0-9][0-9][0-9][0-9]
{
   date=$MATCH
   :0
   * date ?? ^\/[ 0-9][0-9]
   { dd=$MATCH }
   :0
   * date ?? ^[ 0-9][0-9] \/[A-Z][a-z][a-z]
   { mon=$MATCH }
   * date ?? [A-Z][a-z][a-z] \/[0-9][0-9][0-9][0-9]
   { yyyy=$MATCH }
   :0
   * mon ??  1^0 ^Jan
   * mon ??  2^0 ^Feb
   * mon ??  3^0 ^Mar
   * mon ??  4^0 ^Apr
   * mon ??  5^0 ^May
   * mon ??  6^0 ^Jun
   * mon ??  7^0 ^Jul
   * mon ??  8^0 ^Aug
   * mon ??  9^0 ^Sep
   * mon ?? 10^0 ^Oct
   * mon ?? 11^0 ^Nov
   * mon ?? 12^0 ^Dec
   { mm=$= }
}

The \/ token in a regex says to save the matched text after it into the special variable MATCH .正则表达式中的\/标记表示将匹配的文本保存到特殊变量MATCH之后。 We then copy that variable to date and perform additional matching to extract its parts.然后我们将该变量复制到date并执行额外的匹配以提取其部分。

Performing the necessary arithmetic to convert this into seconds since January 1, 1970 should be doable at this point, I hope.我希望,执行必要的算术将其转换为自 1970 年 1 月 1 日以来的秒数应该是可行的。 If you need complete per-day accuracy, you would also need to extract the time and the time zone and adjust to the correct day if it's not in your preferred time zone, or perhaps UTC (that would be +0000 at the very end);如果您需要完整的每日准确度,您还需要提取时间和时区并调整到正确的日期,如果它不在您的首选时区,或者可能是 UTC(最后将是+0000 ) ; but this is just a sketch, anyway, because I think I have a better idea altogether.但这只是一个草图,无论如何,因为我认为我有一个更好的主意。

Namely, save the messages to the correct folder as they arrive, then just forward or discard or archive older folders when you no longer need them.即,在邮件到达时将它们保存到正确的文件夹,然后在您不再需要旧文件夹时转发或丢弃或存档旧文件夹。

MAILDATE_RFC=`formail -czxDate:`
MAILDATE=`date -d "$MAILDATE_RFC" +%F`
:0:
inbox-$MAILDATE

This will save to an mbox file named like inbox-2022-06-10 based on the extracted Date: header.这将根据提取的Date:标头保存到名为inbox-2022-06-10的 mbox 文件中。 (Again, you could avoid the external processes if you really wanted to squeeze out the last bit of performance, using the date parsing sketch above. And again, if you can't have a message from a different time zone land in the previous or next day's folder, you need to recalculate the date for your time zone.) (同样,如果你真的想挤出最后一点性能,你可以避免使用外部进程,使用上面的日期解析草图。同样,如果你不能在前一个时区收到来自不同时区的消息,或者第二天的文件夹,您需要重新计算您所在时区的日期。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM