sed 的“N”命令间歇性工作

Question

Here is an example block of text I want to format:这是我要格式化的示例文本块：

<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>

using these two 'sed' commands in a script:在脚本中使用这两个“sed”命令：

sed -ri '/^<tr><td><\/td><td>/N;s/(\n<tr><td><\/td><td class="tdci">)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/' "$f"   #insert table row with empty data fields (blank line) above first line with 'class="tdci"'
sed -ri '/^<tr><td><\/td><td class="tdci">/N;s/(\n<tr><td><\/td><td>)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/' "$f"   #insert table row with empty data fields (blank line) after last line with 'class="tdci"'

here is the result:结果如下：

<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td>&nbsp;</td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>

So the first sed command works by inserting a blank table row above the first line with class="tdci" , but the almost identical second sed command meant to insert a blank table row after the last line with class="tdci" does not work.因此，第一个sed命令通过在class="tdci"的第一行上方插入一个空白表格行来工作，但是几乎相同的第二个sed命令意味着在class="tdci"的最后一行之后插入一个空白表格行不起作用.

I usually save these kinds of edits, editing between multiple lines, for vim since I never have problems with its similar command, but for some reason sed 's" N;s/ has always been hit and miss for me, as in this example, where one instance works fine, yet a second does not. The script removes all leading/trailing whitespace and any Winblowz carriage returns ( \\r ) before these commands get run.我通常为 vim 保存这些类型的编辑，在多行之间编辑，因为我从来没有遇到过类似命令的问题，但由于某种原因sed 's" N;s/对我来说总是很受欢迎，就像在这个例子中一样, 其中一个实例工作正常，但第二个实例不起作用。在这些命令运行之前，脚本会删除所有前导/尾随空格和任何 Winblowz 回车符 ( \\r )。

Since I have a large number of files to edit I would of course prefer to get this working in a script if anyone might be able to see anything obvious I am doing wrong.由于我有大量文件要编辑，如果有人能够看到我做错的任何明显事情，我当然更愿意在脚本中使用它。

Additional details:额外细节：

Sorry, I forgot to mention that I am running sed in Linux (Debian stable)抱歉，我忘了提及我在 Linux（Debian 稳定版）中运行sed

Answer 1

Start small!从小事做起！ Here's a simpler test case for what you're doing:这是您正在执行的操作的更简单的测试用例：

a1
b1
b2
a2

Here is your code translated for this test case, trying to insert c1 before the first "b" and c2 after the last:这是您为此测试用例翻译的代码，尝试在第一个“b”之前插入c1在最后一个“b”之后插入c2 ：

sed -ri '/a/N; s/(\nb)/\nc1\1/' file
sed -ri '/b/N; s/(\na)/\nc2\1/' file

The first command, like you say, appears to work:就像你说的，第一个命令似乎有效：

a1
c1
b1
b2
a1

The second does not, and just gives you the same result as above rather than inserting c2 .第二个没有，只是给你与上面相同的结果，而不是插入c2 。

Here's what you probably thought would happen, with incorrect parts in bold:以下是您可能认为会发生的情况，不正确的部分以粗体显示：

a1 is read and printed. a1被读取并打印。
c1 is read and printed. c1被读取并打印。
b1 is read. b1被读取。
- It matches /b/ , and b2 is read with N .它匹配/b/ ，并且b2用N读取。
- It doesn't match \\na .它不匹配\\na 。
- b1 is printed b1被打印
b2 is read a second time . b2被第二次读取。
- It matches /b/ , and a is read with N .它匹配/b/ ，并且a用N读取。
- It matches \\na .它匹配\\na 。 c2 is appended.附加了c2 。
- b2\\nc2\\na is printed. b2\\nc2\\na被打印出来。

Here is what actually happens,这是实际发生的事情，

a1 is read and printed. a1被读取并打印。
c1 is read and printed. c1被读取并打印。
b1 is read. b1被读取。
- It matches /b/ , and b2 is read with N .它匹配/b/ ，并且b2用N读取。
- It doesn't match \\na .它不匹配\\na 。
- b1\\nb2 is printed b1\\nb2被打印
a2 is read and printed , because b2 has already been read above. a2被读取并打印，因为b2已经在上面被读取。

Here's a working command:这是一个工作命令：

sed -ri '/b/ { :b; N; s/\na/\nc2&/; te; P; D; bb; }; :e;' file

In pseudocode -- with roughly corresponding sed part in comments -- this is:在伪代码中——注释中大致对应 sed 部分——这是：

if (input.matches("b")) {                               // /b/ {
  while(true) {                                         // :b
    input += "\n" + readline();                         // N
    if(input.matches("\na")) {                          // s/\na/ ..
      input = input.replace("(\na)", "\nc2\1");         // .. \nc2&/
      goto exit;                                        // te
    }
    print(input.substring(0, input.indexOf('\n'));      // P
    input = input.substring(input.indexOf('\n') + 1);   // D
  }                                                     // bb
}                                                       // }
:exit                                                   // :e

Translated back to your data:转换回您的数据：

sed -ri '/^<tr><td><\/td><td class="tdci">/ { :b; N; s/(\n<tr><td><\/td><td>)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/; te; P; D; bb; }; :e' "$f"

Answer 2

@that other guy's excellent answer shows how to do it with sed . @那个其他人的优秀答案展示了如何使用sed做到这一点。

However, sed can be a brain bender when it comes to problems like these that are somewhat procedural in nature, so here's an awk solution that is probably easier to understand :然而，当涉及到这些本质上有点程序性的问题时， sed可能会让人费解，所以这里有一个awk解决方案，它可能更容易理解：

awk -v blockRegex='^<tr><td><\/td><td class="tdci">' \
    -v lineToInsert='<tr><td>\&nbsp;<\/td><\/tr>' \
  '
    # Print a line BEFORE the FIRST line matching `blockRegex`.
  $0 ~ blockRegex { if (!afterFirst) {print lineToInsert; afterFirst=inBlock=1} }
    # Print a line AFTER the LAST (contiguous) line matching `blockRegex`.
  inBlock && $0 !~ blockRegex { print lineToInsert; afterFirst=inBlock=0 }
    # Print the input line.
  { print }
  ' \
  file

Note that this could be optimized further, but I wanted to keep it simpler to clarify the logic.请注意，这可以进一步优化，但我想让它更简单地阐明逻辑。

blockRegex is passed in as a variable (with option -v ) to identify blocks of contiguous lines before and after which a line is to be inserted - with the line to be inserted passed in as variable lineToInsert . blockRegex作为变量（使用选项-v ）传入，以标识要插入行之前和之后的连续行块-要插入的行作为变量lineToInsert 。
$0 ~ blockRegex matches each line in a block of lines of interest and prints the line to insert if it's the first line in the block, as indicated by status variable afterFirst ; $0 ~ blockRegex匹配感兴趣的行块中的每一行，如果它是块中的第一行，则打印要插入的行，如状态变量afterFirst ； status variable inBlock indicates that the line at hand is inside a block of interest.状态变量inBlock指示手头的行在感兴趣的块内。
inBlock && $0 !~ blockRegex matches the first line after the block of interest and prints the line to insert, then resets the status variables. inBlock && $0 !~ blockRegex匹配感兴趣的块之后的第一行并打印要插入的行，然后重置状态变量。
print simply prints the input line as is. print只是按原样打印输入行。

Note that the use of the status variables relies on uninitialized variables in awk defaulting to 0 (which is treated as false in a Boolean context; similarly, a non-zero value evaluates as true ).请注意，状态变量的使用依赖于awk默认为0未初始化变量（在布尔上下文中被视为false ；类似地，非零值计算为true ）。

sed 的“N”命令间歇性工作

问题描述

2 个解决方案

解决方案1
5 已采纳 2014-03-31 21:58:04

解决方案2
2 2014-04-01 02:11:37

sed 的“N”命令间歇性工作

问题描述

2 个解决方案

解决方案1 5 已采纳 2014-03-31 21:58:04

解决方案2 2 2014-04-01 02:11:37

解决方案1
5 已采纳 2014-03-31 21:58:04

解决方案2
2 2014-04-01 02:11:37