简体   繁体   中英

Regex anything(including new lines) up to certain sequence - multiple substrings JS

The File I am trying to process looks like this:

...
...
15 Apr 2014 22:05 - id: content
15 Apr 2014 22:09 - id: content
15 Apr 2014 22:09 - id: content
with new line
16 Apr 2014 06:56 - id: content
with new line
with new line
16 Apr 2014 06:57 - id: content

16 Apr 2014 06:58 - id: content
...
...

the regex I have come up with is this: \\d{1,}[ ][AZ][az]{2}[ ](?:\\d{4}[ ]\\d{2}[:]\\d{2}|\\d{2}[:]\\d{2}).*

which results in:

在此处输入图片说明

This is almost right i just need to include newline characters, but if i include this [\\s\\S]* instead of .* only one match is returned.

在此处输入图片说明

What i would like to extract is a set of substrings where each string starts at the data sequence and ends at the next date sequence like so:

...
...
15 Apr 2014 22:05 - id: content //substring 1
15 Apr 2014 22:09 - id: content //substring 2
15 Apr 2014 22:09 - id: content //substring 3
with new line                   //substring 3
16 Apr 2014 06:56 - id: content //substring 4
with new line                   //substring 4
with new line                   //substring 4
16 Apr 2014 06:57 - id: content //substring 5

16 Apr 2014 06:58 - id: content //substring 6
...
...

Any help to what im missing?

You need to use a positive lookahead assertion.

\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!\n\n)[\s\S])*?(?=\n\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*

DEMO

> var str = '...\n...\n15 Apr 2014 22:05 - id: content\n15 Apr 2014 22:09 - id: content\n15 Apr 2014 22:09 - id: content\nwith new line\n16 Apr 2014 06:56 - id: content\nwith new line\nwith new line\n16 Apr 2014 06:57 - id: content\n\n16 Apr 2014 06:58 - id: content\n...\n...';
undefined
> var re = /\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!\n\n)[\s\S])*?(?=\n\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*/gm;
undefined
> str.match(re)
[ '15 Apr 2014 22:05 - id: content',
  '15 Apr 2014 22:09 - id: content',
  '15 Apr 2014 22:09 - id: content\nwith new line',
  '16 Apr 2014 06:56 - id: content\nwith new line\nwith new line',
  '16 Apr 2014 06:57 - id: content\n',
  '16 Apr 2014 06:58 - id: content' ]

See the second answer here: How to use JavaScript regex over multiple lines?

Try using the non-greedy quantifier [\\s\\S]? like that and see what it returns. Alternatively, just get back one output and split the whole string on newlines afterwards...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM