简体   繁体   English

在新行上分割字符串,同时在JavaScript中保留定界符

[英]Split a string on new lines while keeping delimiter in JavaScript

I have a string like below; 我有一个类似下面的字符串;

text = "\n first \n second \n third"

I want to split this string on new line character and keep the delimiter (\\n and \\r\\n). 我想在换行符上分割此字符串,并保留定界符(\\ n和\\ r \\ n)。 So far I tried this text.split( /(?=\\r?\\n)/g ) The result is like below: 到目前为止,我尝试了此text.split( /(?=\\r?\\n)/g )结果如下:

["↵ first ", "↵ second ", "↵ third"]

But I want this: 但是我想要这个:

["↵", " first ↵", " second ↵", " third"]

What is the correct Regex for that? 正确的正则表达式是什么?

Your JavaScript version might not support lookbehinds. 您的JavaScript版本可能不支持后向。 But here is a trick we can use which avoids them: 但是我们可以使用以下技巧来避免它们:

 text = "\\n first \\n second \\n third" text = text.replace(/\\n/mg, "\\n\\n"); terms = text.split(/\\n(?!\\n)/); console.log(terms); 

This works by replacing every newline \\n with two of them \\n\\n , and then splitting on \\n(?!\\n) . 通过将每个换行\\n替换为其中的两个\\n\\n ,然后在\\n(?!\\n)上进行拆分,即可实现。 That is, after making this replacement, we split on \\n which is not followed by another newline character. 也就是说,进行了替换之后,我们在\\n上分割,后面没有另一个换行符。 This results in consuming the second newline during the split, while retaining the first one which we want to appear in the output. 这将导致在拆分过程中消耗第二个换行符,同时保留我们希望出现在输出中的第一条换行符。

You could match on [^\\n]*\\n? 您可以匹配[^\\n]*\\n? (enabling g flag): (启用g标志):

 text = "\\n\\n first \\n\\n sth \\r with \\r\\n second \\r\\n third \\n forth \\r"; console.log(text.match(/[^\\n]*\\n?/g)); 

You may need to .pop() the returning values because the last value always is an empty string: 您可能需要.pop()返回值,因为最后一个值始终是一个空字符串:

var matches = text.match(/[^\n]*\n?/g);
matches.pop();

You may match any text up to an CRLF or LF or end of string: 您可以将任何文本匹配到CRLF或LF或字符串末尾:

text.match(/.*(?:$|\r?\n)/g).filter(Boolean)
// -> (4) ["↵", " first ↵", " second ↵", " third"]

The .*(?:$|\\r?\\n) pattern matches .*(?:$|\\r?\\n)模式匹配

  • .* - any 0 or more chars other than newline .* -除换行符外的任何0个或更多字符
  • (?:$|\\r?\\n) - either end of string or an optional carriage return and a newline. (?:$|\\r?\\n) -字符串的结尾或可选的回车符和换行符。

JS demo: JS演示:

 console.log("\\r\\n first \\r\\n second \\r\\n third".match(/.*(?:$|\\r?\\n)/g)); console.log("\\n first \\r\\n second \\r third".match(/.*(?:$|\\r?\\n)/g)); console.log("\\n\\n\\n first \\r\\n second \\r third".match(/.*(?:$|\\r?\\n)/g)); 

For ECMAScript 2018 standard supporting JS environments , it is as simple as using a lookbehind pattern like 对于支持JS环境的ECMAScript 2018标准 ,它就像使用像

text.split(/(?<=\r?\n)/)

It will split at all positions that immediately follow an optional CR + LF symbol. 它将在紧跟可选CR + LF符号的所有位置分开。

Another splitting regex is /^(?!$)/m : 另一个拆分正则表达式是/^(?!$)/m

 console.log("\\r\\n first \\r\\n second \\r\\n third".split(/^(?!$)/m)); console.log("\\n first \\r\\n second \\r third".split(/^(?!$)/m)); console.log("\\n\\n\\n first \\r\\n second \\r third".split(/^(?!$)/m)); 

Here, the strings are split at each position after a CR or LF that are not at the end of a line. 此处,在不位于行尾的CR或LF之后的每个位置处拆分字符串。

Note you do not need a global modifier with String#split since it splits at all found positions by default. 请注意,您不需要带有String#split的全局修饰符,因为默认情况下它会在所有找到的位置拆分。

You can use this simple regex: 您可以使用以下简单的正则表达式:

/.*?(\n|$)/g

It will match any number of any char including Newline '\\n or end of string. 它将匹配任意数量的任何字符,包括Newline '\\ n或字符串结尾。

You can access the matches as an array (Works like splitting but keeps the separator in the match). 您可以将匹配项作为array访问(工作原理类似于拆分,但将分隔符保留在匹配项中)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM