简体   繁体   English

sed 的“N”命令间歇性工作

[英]sed's 'N' command working intermittently

Here is an example block of text I want to format:这是我要格式化的示例文本块:

<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>

using these two 'sed' commands in a script:在脚本中使用这两个“sed”命令:

sed -ri '/^<tr><td><\/td><td>/N;s/(\n<tr><td><\/td><td class="tdci">)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/' "$f"   #insert table row with empty data fields (blank line) above first line with 'class="tdci"'
sed -ri '/^<tr><td><\/td><td class="tdci">/N;s/(\n<tr><td><\/td><td>)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/' "$f"   #insert table row with empty data fields (blank line) after last line with 'class="tdci"'

here is the result:结果如下:

<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td>&nbsp;</td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>

So the first sed command works by inserting a blank table row above the first line with class="tdci" , but the almost identical second sed command meant to insert a blank table row after the last line with class="tdci" does not work.因此,第一个sed命令通过在class="tdci"的第一行上方插入一个空白表格行来工作,但是几乎相同的第二个sed命令意味着在class="tdci"的最后一行之后插入一个空白表格行不起作用.

I usually save these kinds of edits, editing between multiple lines, for vim since I never have problems with its similar command, but for some reason sed 's" N;s/ has always been hit and miss for me, as in this example, where one instance works fine, yet a second does not. The script removes all leading/trailing whitespace and any Winblowz carriage returns ( \\r ) before these commands get run.我通常为 vim 保存这些类型的编辑,在多行之间编辑,因为我从来没有遇到过类似命令的问题,但由于某种原因sed 's" N;s/对我来说总是很受欢迎,就像在这个例子中一样, 其中一个实例工作正常,但第二个实例不起作用。在这些命令运行之前,脚本会删除所有前导/尾随空格和任何 Winblowz 回车符 ( \\r )。

Since I have a large number of files to edit I would of course prefer to get this working in a script if anyone might be able to see anything obvious I am doing wrong.由于我有大量文件要编辑,如果有人能够看到我做错的任何明显事情,我当然更愿意在脚本中使用它。

Additional details:额外细节:

Sorry, I forgot to mention that I am running sed in Linux (Debian stable)抱歉,我忘了提及我在 Linux(Debian 稳定版)中运行sed

Start small!从小事做起! Here's a simpler test case for what you're doing:这是您正在执行的操作的更简单的测试用例:

a1
b1
b2
a2

Here is your code translated for this test case, trying to insert c1 before the first "b" and c2 after the last:这是您为此测试用例翻译的代码,尝试在第一个“b”之前插入c1在最后一个“b”之后插入c2

sed -ri '/a/N; s/(\nb)/\nc1\1/' file
sed -ri '/b/N; s/(\na)/\nc2\1/' file

The first command, like you say, appears to work:就像你说的,第一个命令似乎有效:

a1
c1
b1
b2
a1

The second does not, and just gives you the same result as above rather than inserting c2 .第二个没有,只是给你与上面相同的结果,而不是插入c2

Here's what you probably thought would happen, with incorrect parts in bold:以下是您可能认为会发生的情况,不正确的部分以粗体显示:

  1. a1 is read and printed. a1被读取并打印。
  2. c1 is read and printed. c1被读取并打印。
  3. b1 is read. b1被读取。
    • It matches /b/ , and b2 is read with N .它匹配/b/ ,并且b2N读取。
    • It doesn't match \\na .它不匹配\\na
    • b1 is printed b1被打印
  4. b2 is read a second time . b2被第二次读取
    • It matches /b/ , and a is read with N .它匹配/b/ ,并且aN读取。
    • It matches \\na .它匹配\\na c2 is appended.附加了c2
    • b2\\nc2\\na is printed. b2\\nc2\\na被打印出来。

Here is what actually happens,这是实际发生的事情,

  1. a1 is read and printed. a1被读取并打印。
  2. c1 is read and printed. c1被读取并打印。
  3. b1 is read. b1被读取。
    • It matches /b/ , and b2 is read with N .它匹配/b/ ,并且b2N读取。
    • It doesn't match \\na .它不匹配\\na
    • b1\\nb2 is printed b1\\nb2被打印
  4. a2 is read and printed , because b2 has already been read above. a2被读取并打印,因为b2已经在上面被读取。

Here's a working command:这是一个工作命令:

sed -ri '/b/ { :b; N; s/\na/\nc2&/; te; P; D; bb; }; :e;' file

In pseudocode -- with roughly corresponding sed part in comments -- this is:在伪代码中——注释中大致对应 sed 部分——这是:

if (input.matches("b")) {                               // /b/ {
  while(true) {                                         // :b
    input += "\n" + readline();                         // N
    if(input.matches("\na")) {                          // s/\na/ ..
      input = input.replace("(\na)", "\nc2\1");         // .. \nc2&/
      goto exit;                                        // te
    }
    print(input.substring(0, input.indexOf('\n'));      // P
    input = input.substring(input.indexOf('\n') + 1);   // D
  }                                                     // bb
}                                                       // }
:exit                                                   // :e

Translated back to your data:转换回您的数据:

sed -ri '/^<tr><td><\/td><td class="tdci">/ { :b; N; s/(\n<tr><td><\/td><td>)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/; te; P; D; bb; }; :e' "$f"

@that other guy's excellent answer shows how to do it with sed . @那个其他人的优秀答案展示了如何使用sed做到这一点。

However, sed can be a brain bender when it comes to problems like these that are somewhat procedural in nature, so here's an awk solution that is probably easier to understand :然而,当涉及到这些本质上有点程序性的问题时, sed可能会让人费解,所以这里有一个awk解决方案,它可能更容易理解

awk -v blockRegex='^<tr><td><\/td><td class="tdci">' \
    -v lineToInsert='<tr><td>\&nbsp;<\/td><\/tr>' \
  '
    # Print a line BEFORE the FIRST line matching `blockRegex`.
  $0 ~ blockRegex { if (!afterFirst) {print lineToInsert; afterFirst=inBlock=1} }
    # Print a line AFTER the LAST (contiguous) line matching `blockRegex`.
  inBlock && $0 !~ blockRegex { print lineToInsert; afterFirst=inBlock=0 }
    # Print the input line.
  { print }
  ' \
  file

Note that this could be optimized further, but I wanted to keep it simpler to clarify the logic.请注意,这可以进一步优化,但我想让它更简单地阐明逻辑。

  • blockRegex is passed in as a variable (with option -v ) to identify blocks of contiguous lines before and after which a line is to be inserted - with the line to be inserted passed in as variable lineToInsert . blockRegex作为变量(使用选项-v )传入,以标识要插入行之前和之后的连续行块-要插入的行作为变量lineToInsert
  • $0 ~ blockRegex matches each line in a block of lines of interest and prints the line to insert if it's the first line in the block, as indicated by status variable afterFirst ; $0 ~ blockRegex匹配感兴趣的行块中的每一行,如果它是块中的第一行,则打印要插入的行,如状态变量afterFirst status variable inBlock indicates that the line at hand is inside a block of interest.状态变量inBlock指示手头的行在感兴趣的块内。
  • inBlock && $0 !~ blockRegex matches the first line after the block of interest and prints the line to insert, then resets the status variables. inBlock && $0 !~ blockRegex匹配感兴趣的块之后第一行并打印要插入的行,然后重置状态变量。
  • print simply prints the input line as is. print只是按原样打印输入行。

Note that the use of the status variables relies on uninitialized variables in awk defaulting to 0 (which is treated as false in a Boolean context; similarly, a non-zero value evaluates as true ).请注意,状态变量的使用依赖于awk默认为0未初始化变量(在布尔上下文中被视为false ;类似地,非零值计算为true )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM