简体   繁体   English

正则表达式以匹配whatsapp聊天记录

[英]Regex to match whatsapp chat log

I've been trying to create Regex for WhatsApp chat log. 我一直在尝试为WhatsApp聊天日志创建正则表达式。

So far I've been able to achieve this 到目前为止,我已经能够实现这一目标

Click Here for the test link 点击这里测试链接

By creating the following Regex: 通过创建以下正则表达式:

(?P<datetime>\d{2}\/\d{2}\/\d{4},\s\d(?:\d)?:\d{2} [pa].m.)\s-\s(?P<name>[^:]*):(?P<message>.*)

The problem with this regex is, it is not able to match big messages which span multiple lines with line breaks. 此正则表达式的问题在于,它无法匹配跨行且带有换行符的大消息。 You can see the issue in the link provided above. 您可以在上面提供的链接中看到问题。

Help would be appreciated. 帮助将不胜感激。

Thank you. 谢谢。

There you go: 你去了:

^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)

See your modified demo on regex101.com . 在regex101.com上查看修改后的演示


Essentially, I added anchors, simplified your datetime part and inserted a [\\s\\S]+? 本质上,我添加了锚点,简化了日期时间部分,并插入了[\\s\\S]+? which means: match anything lazily (including newlines) up to the following condition which is a lookahead. 这意味着: 懒惰地匹配任何内容(包括换行符),直到满足以下条件,即为先行条件。 The lookahead makes sure there's either another two digits right after a newline (could be tightened!) or the very end of the string. 前行确保在换行符之后(可能会变紧!)或字符串的最末尾有另外两位数字。

The dot does not match newline characters, which is why you only get the first line matched. 点与换行符不匹配,这就是为什么只匹配第一行的原因。 The matching behaviour of a regular expression engine can usually be modified with flags. 正则表达式引擎的匹配行为通常可以使用标志进行修改。

On the regexp101 page, you can click on Set Regex Options (the flag right next to the regular expression input field) and activate Single line , then the dot will also match \\n . 在regexp101页面上,您可以单击Set Regex Options设置正则表达式选项) (正则表达式输入字段旁边的标记)并激活Single line ,然后该点也将匹配\\n

But then you have to modify your expression so that it detects the start of the next message, otherwise everything will be interpreted as one message. 但是随后您必须修改表达式,以便它检测到下一条消息的开始,否则所有内容将被解释为一条消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM