正则表达式javascript拆分

Question

I'm trying to create a Regex javascript split, but I'm totally stuck. 我正在尝试创建一个正则表达式的javascript拆分，但我完全卡住了。 Here's my input: 这是我的意见：

9:30 pm
The user did action A.

10:30 pm
Welcome, user John Doe.

***This is a comment

11:30 am
This is some more input.

I want the output array after the split() to be (I've removed the \\n for readability): 我希望split（）之后的输出数组是（为了便于阅读，我删除了\\n ）：

["9:30 pm The user did action A.", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30 am This is some more input." ];

My current regular expression is: 我目前的正则表达式是：

var split = text.split(/\s*(?=(\b\d+:\d+|\*\*\*))/);

This works, but there is one problem: the timestamps get repeated in extra elements. 这有效，但有一个问题：时间戳在额外的元素中重复。 So I get: 所以我得到：

["9:30", "9:30 pm The user did action A.", "10:30",  "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30", "11:30 am This is some more input." ];

I cant split on the newlines \\n because they aren't consistent, and sometimes there may be no newlines at all. 我不能分开新行\\n因为它们不一致，有时可能根本没有新行。

Could you help me out with a Regex for this? 你可以帮我解决这个问题吗？

Thanks so much!! 非常感谢！！

EDIT: in reply to phleet 编辑：回复phleet

It could look like this: 它可能看起来像这样：

9:30 pm
The user did action A.

He also did action B

10:30 pm Welcome, user John Doe.

Basically, there may or may not be a newline after the timestamp, and there may be multiple newlines for the event description. 基本上，时间戳之后可能有也可能没有换行符，并且事件描述可能有多个换行符。

Answer 1

I believe the issue is with regards to how Javascript's split treats capturing groups. 我认为问题在于Javascript的split如何处理捕获组。 The solution may just be to use non-capturing group in your pattern. 解决方案可能只是在您的模式中使用非捕获组。 That is, instead of: 也就是说，而不是：

/\s*(?=(\b\d+:\d+|\*\*\*))/

Use 采用

/\s*(?=(?:\b\d+:\d+|\*\*\*))/
        ^^

The (?:___) is what is called a non-capturing group. (?:___)是所谓的非捕获组。

Looking at the overall pattern, however, the grouping is not actually needed. 然而，从整体模式来看，实际上并不需要分组。 You should be able to just use: 你应该可以使用：

/\s*(?=\b\d+:\d+|\*\*\*)/

References 参考

regular-expressions.info/Grouping regular-expressions.info/Grouping

Minor point 小点

Instead of \\*\\*\\* , you could use [*]{3} . 而不是\\*\\*\\* ，您可以使用[*]{3} 。 This may be more readable. 这可能更具可读性。 The * is not a meta-character inside a character class definition, so it doesn't have to be escaped. *不是字符类定义中的元字符，因此不必转义。 The {3} is how you denote "exactly 3 repetition of". {3}是你如何表示“正好3次重复”。

References 参考

regular-expressions.info/Character Class and Repetition regular-expressions.info/Character Class and Repetition

正则表达式javascript拆分

问题描述

1 个解决方案

解决方案1
3 已采纳 2010-06-18 07:44:37

References 参考

Minor point 小点

References 参考

正则表达式javascript拆分

问题描述

1 个解决方案

解决方案1 3 已采纳 2010-06-18 07:44:37

References 参考

Minor point 小点

References 参考

解决方案1
3 已采纳 2010-06-18 07:44:37