[英]Regular expression javascript split
I'm trying to create a Regex javascript split, but I'm totally stuck. 我正在尝试创建一个正则表达式的javascript拆分,但我完全卡住了。 Here's my input:
这是我的意见:
9:30 pm
The user did action A.
10:30 pm
Welcome, user John Doe.
***This is a comment
11:30 am
This is some more input.
I want the output array after the split() to be (I've removed the \\n
for readability): 我希望split()之后的输出数组是(为了便于阅读,我删除了
\\n
):
["9:30 pm The user did action A.", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30 am This is some more input." ];
My current regular expression is: 我目前的正则表达式是:
var split = text.split(/\s*(?=(\b\d+:\d+|\*\*\*))/);
This works, but there is one problem: the timestamps get repeated in extra elements. 这有效,但有一个问题:时间戳在额外的元素中重复。 So I get:
所以我得到:
["9:30", "9:30 pm The user did action A.", "10:30", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30", "11:30 am This is some more input." ];
I cant split on the newlines \\n
because they aren't consistent, and sometimes there may be no newlines at all. 我不能分开新行
\\n
因为它们不一致,有时可能根本没有新行。
Could you help me out with a Regex for this? 你可以帮我解决这个问题吗?
Thanks so much!! 非常感谢!!
EDIT: in reply to phleet 编辑:回复phleet
It could look like this: 它可能看起来像这样:
9:30 pm
The user did action A.
He also did action B
10:30 pm Welcome, user John Doe.
Basically, there may or may not be a newline after the timestamp, and there may be multiple newlines for the event description. 基本上,时间戳之后可能有也可能没有换行符,并且事件描述可能有多个换行符。
I believe the issue is with regards to how Javascript's split
treats capturing groups. 我认为问题在于Javascript的
split
如何处理捕获组。 The solution may just be to use non-capturing group in your pattern. 解决方案可能只是在您的模式中使用非捕获组。 That is, instead of:
也就是说,而不是:
/\s*(?=(\b\d+:\d+|\*\*\*))/
Use 采用
/\s*(?=(?:\b\d+:\d+|\*\*\*))/
^^
The (?:___)
is what is called a non-capturing group. (?:___)
是所谓的非捕获组。
Looking at the overall pattern, however, the grouping is not actually needed. 然而,从整体模式来看,实际上并不需要分组。 You should be able to just use:
你应该可以使用:
/\s*(?=\b\d+:\d+|\*\*\*)/
Instead of \\*\\*\\*
, you could use [*]{3}
. 而不是
\\*\\*\\*
,您可以使用[*]{3}
。 This may be more readable. 这可能更具可读性。 The
*
is not a meta-character inside a character class definition, so it doesn't have to be escaped. *
不是字符类定义中的元字符,因此不必转义。 The {3}
is how you denote "exactly 3 repetition of". {3}
是你如何表示“正好3次重复”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.