简体繁体 English

如何记录传入邮件池中的 URL？

[英]How can I log URLs that come in my incoming mailspool?

原文 2020-06-05 19:04:16 1 1 email/ text/ procmail

I have a mailspool on a UNIX system... so, /var/mail/username... and it is in mbox format.我在 UNIX 系统上有一个邮件池……所以，/var/mail/username……它是 mbox 格式。

Once the email is stored in mbox format, the URLs that come in emails are chopped into 40 character lines with '=' or '=3D' separators, etc., and are just impossible to copy/paste or work with in any way.一旦 email 以 mbox 格式存储，电子邮件中的 URL 将被切成 40 个字符行，并带有“=”或“=3D”分隔符等，并且无法以任何方式复制/粘贴或使用。

So... I would like to just log all URLs to a file before they hit the mailspool and if I want to use a URL I can just check that plain text file.所以...我想在所有 URL 进入邮件池之前将它们记录到一个文件中，如果我想使用 URL 我可以检查那个纯文本文件。

I think the way to do this is to extract all URLs from all incoming mail, with procmail - but is that correct?我认为这样做的方法是使用 procmail 从所有传入邮件中提取所有 URL - 但这是正确的吗？ Not only do I need to extract the URL before it gets mbox'ed, but I want to keep adding them to the end of a single file .我不仅需要在 URL 被 mbox'ed 之前提取它，而且我还想继续将它们添加到单个文件的末尾。

I am aware that there is a "golden regex"... "one regex to rule them all" for extracting URLs from text and I assume I will use that, but I don't know how to invoke a regex in procmail that will just append to an existing text file...我知道有一个“黄金正则表达式”......“一个正则表达式来统治它们”用于从文本中提取 URL，我假设我会使用它，但我不知道如何在 procmail 中调用一个正则表达式只是 append 到现有的文本文件...

Thank you.谢谢你。

1 个解决方案

Your diagnosis is incorrect.你的诊断不正确。 The messages are MIME messages which use quoted-printable encoding;这些消息是使用引用可打印编码的 MIME 消息； this is how those URLs are represented in that encoding, probably ever since the author of the message originally composed and sent it.这就是这些 URL 在该编码中的表示方式，可能自从消息的作者最初编写并发送它以来。 (But not all messages are quoted-printable; MIME permits unencoded plain text as long as the message meets some simple requirements, and at the other end of the spectrum, message parts can be base64 encoded just as well.) （但并非所有消息都是可引用打印的；只要消息满足一些简单的要求，MIME 允许未编码的纯文本，另一方面，消息部分也可以进行 base64 编码。）

Procmail is not particularly equipped to traverse and decode MIME structures. Procmail 并不是特别具备遍历和解码 MIME 结构的能力。 If your goal is to extract all URLs from all MIME parts, perhaps you could run something like ripmime on each incoming message and extract URLs from the files containing the decoded and extracted message parts, or perhaps write a simple URL extraction script in eg Python.如果您的目标是从所有 MIME 部分中提取所有 URL，也许您可以对每条传入消息运行类似ripmime的操作，并从包含已解码和提取的消息部分的文件中提取 URL，或者可能编写一个简单的 URL 提取脚本，例如 Python。