简体   繁体   English

PHP解析Gmail邮件回复

[英]PHP parsing gmail mail reply

I am trying to parse GMail emails, but have one problem: how do I know which message a reply corresponds to? 我正在尝试解析GMail电子邮件,但有一个问题:如何知道回复对应的消息?

I tried sorting email by subject. 我尝试按主题对电子邮件进行排序。 For example, if a message has the subject "hi Jack", then all messages with subject "Re: hi Jack" are a reply to this mail. 例如,如果邮件的主题为“ hi Jack”,则所有主题为“ Re:hi Jack”的邮件都是对此邮件的回复。

But what do I do if I have many emails with the same subject? 但是,如果我有很多主题相同的电子邮件该怎么办? How do I know which email they are replies to? 我如何知道他们回复的电子邮件?

Do emails perhaps have a unique code for what the reply goes to? 电子邮件对于回复的内容是否可能具有唯一的代码? Maybe there is an ID or something like that to know what the children of a message are(?). 也许有一个ID或类似的东西可以知道消息的子代是什么?

Threading by subject is not a good idea because there may be as you noticed several different threads based on identical subjects. 按主题进行线程并不是一个好主意,因为您可能已经注意到,基于相同主题的多个不同线程。

You need to examine 3 headers in the message to make threading (or other kind of grouping) possible: 您需要检查消息中的3个标头才能进行线程化(或其他类型的分组):

Message-ID: contains unique message identifier (what you call "unique code") in a string surrounded by < and > characters eg <123456@User1PC> Most MUAs will create identifiers in above form or something similar to that. Message-ID:包含<>字符的字符串中包含唯一的消息标识符(您称为“唯一代码”),例如<123456@User1PC>大多数MUA将以上述形式或类似的方式创建标识符。 This header should be generated when a new message is sent. 发送新消息时应生成此标头。

In-Reply-To: contains a message this particular reply is related to eg <789abcd@User2PC> . In-Reply-To:包含与该特定回复相关的消息,例如<789abcd@User2PC> This header should be copied from Message-ID it replies to. 此标头应从其答复的Message-ID复制。

References: contains list of recent references to messages in this "thread". References:包含对该“线程”中消息的最新参考列表。 The format is similar to above except they are separated eg <123456@User1PC> <789abcd@User2PC> It is there so that you can use it to locate message in the thread. 格式与上面类似,不同之处在于它们是分开的,例如<123456@User1PC> <789abcd@User2PC>在那里,您可以使用它在线程中定位消息。

If one message has been replied or posted a few days later it might be hard to locate it without list of references. 如果几天后回复或发布了一条消息,可能没有参考文献列表就很难找到它。 Usually list of references is trimmed by mail clients to a reasonable size. 通常,邮件客户端会将引用列表修剪为合理的大小。 By reasonable, I mean, trimming it enough to be able to locate message in a thread but keep the header under reasonable size (not having too many references). 合理地说,我的意思是,对其进行足够的修剪以使其能够在线程中定位消息,但将标头保持在合理的大小内(没有太多的引用)。 For example it may contain 5-10 references which is more than enough usually to connect it to other messages. 例如,它可能包含5-10个引用,这通常足以将其连接到其他消息。 References: are also useful in case if original message (first one) has been deleted so even without it, you can still utilize References: list to build a threaded (grouped) messages. 如果删除了原始消息(第一条消息),则References:也是有用的,因此即使没有原始消息,您仍然可以使用References: list来构建线程化(分组的)消息。

So, in order to thread messages, you would need to read all of them, and then sort threads based on the information you can extract from above headers. 因此,为了对消息进行线程化,您需要阅读所有消息,然后根据可从上述标头中提取的信息对线程进行排序。

If references or message ids are not in form you can recognize (eg <example@something> you can bail out by not threading these messages and displaying them as unthreaded. So generic algorithm for threading/locating might look something like this: 如果引用或消息ID的格式不正确,您可以识别(例如, <example@something>您可以通过不对这些消息进行线程化并将其显示为非线程来纾困。因此,用于线程化/定位的通用算法可能如下所示:

  1. Take first message ID 获取第一条消息ID
  2. Examine nearby (by date) messages to see if one of them contains message ID in its references list or in-reply-to - if there are none - you can't group it so keep it as standalone message. 检查附近的消息(按日期),以查看其中的一条消息是否在其引用列表中包含消息ID或是否已答复-如果没有消息-您无法对其进行分组,因此请将其保留为独立消息。
  3. Group messages somehow, perhaps based on Date: , or Received: header 以某种方式对消息进行分组,可能基于Date:Received:标头
  4. Place this message into "Done" list so you don't need to examine it further (or related references) 将此消息放入“完成”列表中,因此您无需进一步检查(或相关参考文献)
  5. Continue until you can't find any more references and then move to next message which is not already in "Done" list and repeat steps until you process entire message list. 继续操作,直到找不到更多引用为止,然后移至“完成”列表中尚未存在的下一条消息,并重复步骤直到处理整个消息列表。

It will probably take you a while to get this done properly but now at least you have a starting point to look into. 可能需要一些时间才能正确完成此操作,但至少现在您有一个起点可以研究。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM