简体   繁体   English

仅删除某些其他匹配行之间的空白行

[英]Removing blank lines between only certain other matching lines

I am trying to remove blank lines between other lines that match a particular pattern. 我正在尝试删除与特定模式匹配的其他行之间的空白行。 In my case, that pattern is just that the line begins with a - character. 就我而言,该模式只是该行以-字符开头。

const orig = `
- line1

- line2

- line3

- line4

- line5
`.trim();

const actual =
  orig.replace(/((?:^|\n)-.*\n)\n(-)/g, '$1$2');

In the code above, I'm using a regex to match: 在上面的代码中,我使用正则表达式进行匹配:

  • a newline (or string start), followed by... 换行符(或字符串开头),然后...
  • a - prefixed line, followed by.. -前缀行,后跟..
  • an empty line, followed by... 空行,然后...
  • another - 另一个-

I'm globally replacing the entire expression with the two capture groups that omit the empty line between them. 我正在全局地用两个捕获组替换整个表达式,这两个捕获组之间省略了空行。 This sort of works like I expected it to, but omits every other empty line, and I don't know why. 有点像我作品,预计到,但省略所有其他空行,我不知道为什么。

Where I would have expected the code above to give me this: 我期望上面的代码给我这样的地方:

- line1
- line2
- line3
- line4
- line5

...it actually gives me this: ...实际上给了我这个:

- line1
- line2

- line3
- line4

- line5

Here is a fiddle that demonstrates the problem. 这是一个证明问题的小提琴。

Question: What about the regex is causing this behavior? 问题:正则表达式会导致这种行为吗?

Bonus: Is there a better way to do this? 奖励:有更好的方法吗? (eg via split / reduce - although I would still like to know why it doesn't work) (例如,通过split / reduce尽管我仍然想知道为什么它不起作用)

The last - is a part of the consuming pattern. 最后-是消费模式的一部分。 Once the (-) matches, the regex index is set after that - , and you cannot find that match as - in (?:^|\\n)- cannot match that - . 一旦(-)匹配,则在-之后设置正则表达式索引,而在- (?:^|\\n)-中找不到与-匹配的正则表达式。 You need to put it into a positive lookahead. 您需要将其提前确定。 Then, you need to use m modifier to let ^ match start of a line positions, not just start of string. 然后,您需要使用m修饰符让^匹配位置的开头,而不仅仅是字符串的开头。

Use 采用

/((?:^|\n)-.*\n)\n(?=-)/gm

See the regex demo . 参见regex演示 Replacement string is reduced to $1 since there is only one capturing group left. 由于只剩下一个捕获组,替换字符串减少为$1

Here is the fixed expression demo: 这是固定表达式演示:

 const orig = ` - line1 - line2 - line3 - line4 - line5 `.trim(); const actual = orig.replace(/((?:^|\\n)-.*\\n)\\n(?=-)/gm, '$1'); document.getElementById('orig').innerText = orig; document.getElementById('actual').innerText = actual; 
 ul { font-family: sans-serif; list-style: none; padding: 0; } li { display: inline-block; padding: 1em; vertical-align: top; } 
 <ul> <li><h3>Original</h3><pre id="orig"></pre></li> <li><h3>Expected</h3><pre>- line1<br />- line2<br />- line3<br />- line4<br />- line5</pre></li> <li><h3>Actual</h3><pre id="actual"></pre></li> </ul> 

The reason for this behavior is that the regex does not overlap matches. 此行为的原因是正则表达式不与匹配项重叠。 It consumes and matches: 它消耗并匹配:

- line 1

- 

Replaces with: 替换为:

- line 1
- 

And then continues traversing the string from the end of its previous match. 然后从上一个匹配的结尾继续遍历该字符串。

For this reason it does not match the next newline, because 因此,它与下一个换行符不匹配,因为

  line 2

- line 3

Does not contain a match your pattern. 不包含与您的模式匹配的内容。 The next match to your pattern will be 您的图案的下一个匹配项将是

<newline>
- line 3

-

Replaced by: 取而代之:

<newline>
- line 3
-

A way to solve this is by using either lookaheads or lookbehinds , which allow conditional matching based on surrounding patterns without consuming those patterns . 解决此问题的方法是使用先行或后行 ,它们允许基于周围模式进行条件匹配而无需使用这些模式

We can modify your pattern slightly to use a lookahead to make sure the next line adheres to the pattern 我们可以略微修改您的图案以使用前瞻性以确保下一行符合该图案

const actual = orig.replace(/^(-.*\n)\n(?=-)/gm, '$1');

https://regex101.com/r/fPUkYh/4 https://regex101.com/r/fPUkYh/4

I also changed ((?:^|\\n)-.*\\n)\\n to ^(-.*\\n)\\n and added the m flag because the start of line assertion ^ does not need to be in the capturing group and the \\n leads to the removal of preceding newlines. 我还将((?:^|\\n)-.*\\n)\\n更改为^(-.*\\n)\\n并添加了m标志,因为行断言的开始^不需要在捕获组, \\n导致删除前面的换行符。

This pattern could also be modified to match an arbitrary number of bl;ank lines in between lines matching the pattern: 也可以修改此模式以匹配任意数量的bl;与该模式匹配的行之间的ank行:

/^(-.*\n)\n+(?=-)/gm

https://regex101.com/r/X7B7pi/2 https://regex101.com/r/X7B7pi/2

Easy enough when using the Multi-line modifier //m 使用多行修饰符//m足够容易

 (                             # (1 start), Stuff to write back
      ^                             # BOL
      - .* 
      \r? \n      
 )                             # (1 end)
 \s*                           # Blank lines to remove
 \r? \n 

 var orig_str = "- line1\\n\\n\\n- line2\\n\\n- line3\\n\\n- line4\\n\\n- line5\\n- line6"; var new_str = orig_str.replace(/(^-.*\\r?\\n)\\s*\\r?\\n/mg, '$1'); console.log( "Original\\n--------\\n" + orig_str + "\\n" ); console.log( "New\\n--------\\n" + new_str ); 

Output 输出量

Original
--------
- line1


- line2

- line3

- line4

- line5
- line6


New
--------
- line1
- line2
- line3
- line4
- line5
- line6

If just between -lines is what you need, just add an assertion at the 如果只需要在-lines之间-lines ,只需在
end (^-.*\\r?\\n)\\s*\\r?\\n(?=-) 结束(^-。* \\ r?\\ n)\\ s * \\ r?\\ n(?=-)

You can do it in the following way 您可以通过以下方式进行

 const orig = ` - line1 - line2 - line3 - line4 - line5 `.trim(); const actual = orig.replace(/(\\-[^\\n]*)([^-]*)(?=-)/g, '$1\\n'); document.getElementById('orig').innerText = orig; document.getElementById('actual').innerText = actual; 
 <ul> <li><h3>Original</h3><pre id="orig"></pre></li> <li><h3>Expected</h3><pre>- line1<br />- line2<br />- line3<br />- line4<br />- line5</pre></li> <li><h3>Actual</h3><pre id="actual"></pre></li> </ul> 

see the regex demo 正则表达式演示

这里是一个较短的正则表达式,包括您要进行加工的模式:

const actual = orig.replace(/(-.*\\n)\\n/g, '$1');

这会给您您所需要的-

const actual = orig.replace(/\n\n|\r\r/g, "\n");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM