[英]Regex to match whatsapp chat log
I've been trying to create Regex for WhatsApp chat log. 我一直在尝试为WhatsApp聊天日志创建正则表达式。
So far I've been able to achieve this 到目前为止,我已经能够实现这一目标
Click Here for the test link 点击这里测试链接
By creating the following Regex: 通过创建以下正则表达式:
(?P<datetime>\d{2}\/\d{2}\/\d{4},\s\d(?:\d)?:\d{2} [pa].m.)\s-\s(?P<name>[^:]*):(?P<message>.*)
The problem with this regex is, it is not able to match big messages which span multiple lines with line breaks. 此正则表达式的问题在于,它无法匹配跨行且带有换行符的大消息。 You can see the issue in the link provided above.
您可以在上面提供的链接中看到问题。
Help would be appreciated. 帮助将不胜感激。
Thank you. 谢谢。
There you go: 你去了:
^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)
See your modified demo on regex101.com . 在regex101.com上查看修改后的演示 。
[\\s\\S]+?
[\\s\\S]+?
which means: match anything lazily (including newlines) up to the following condition which is a lookahead.
The dot does not match newline characters, which is why you only get the first line matched. 点与换行符不匹配,这就是为什么只匹配第一行的原因。 The matching behaviour of a regular expression engine can usually be modified with flags.
正则表达式引擎的匹配行为通常可以使用标志进行修改。
On the regexp101 page, you can click on Set Regex Options (the flag right next to the regular expression input field) and activate Single line , then the dot will also match \\n
. 在regexp101页面上,您可以单击Set Regex Options ( 设置正则表达式选项) (正则表达式输入字段旁边的标记)并激活Single line ,然后该点也将匹配
\\n
。
But then you have to modify your expression so that it detects the start of the next message, otherwise everything will be interpreted as one message. 但是随后您必须修改表达式,以便它检测到下一条消息的开始,否则所有内容将被解释为一条消息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.