[英]sed's 'N' command working intermittently
Here is an example block of text I want to format:这是我要格式化的示例文本块:
<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>
using these two 'sed' commands in a script:在脚本中使用这两个“sed”命令:
sed -ri '/^<tr><td><\/td><td>/N;s/(\n<tr><td><\/td><td class="tdci">)/\n<tr><td>\ <\/td><\/tr>\1/' "$f" #insert table row with empty data fields (blank line) above first line with 'class="tdci"'
sed -ri '/^<tr><td><\/td><td class="tdci">/N;s/(\n<tr><td><\/td><td>)/\n<tr><td>\ <\/td><\/tr>\1/' "$f" #insert table row with empty data fields (blank line) after last line with 'class="tdci"'
here is the result:结果如下:
<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td> </td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>
So the first sed
command works by inserting a blank table row above the first line with class="tdci"
, but the almost identical second sed
command meant to insert a blank table row after the last line with class="tdci"
does not work.因此,第一个sed
命令通过在class="tdci"
的第一行上方插入一个空白表格行来工作,但是几乎相同的第二个sed
命令意味着在class="tdci"
的最后一行之后插入一个空白表格行不起作用.
I usually save these kinds of edits, editing between multiple lines, for vim since I never have problems with its similar command, but for some reason sed
's" N;s/
has always been hit and miss for me, as in this example, where one instance works fine, yet a second does not. The script removes all leading/trailing whitespace and any Winblowz carriage returns ( \\r
) before these commands get run.我通常为 vim 保存这些类型的编辑,在多行之间编辑,因为我从来没有遇到过类似命令的问题,但由于某种原因sed
's" N;s/
对我来说总是很受欢迎,就像在这个例子中一样, 其中一个实例工作正常,但第二个实例不起作用。在这些命令运行之前,脚本会删除所有前导/尾随空格和任何 Winblowz 回车符 ( \\r
)。
Since I have a large number of files to edit I would of course prefer to get this working in a script if anyone might be able to see anything obvious I am doing wrong.由于我有大量文件要编辑,如果有人能够看到我做错的任何明显事情,我当然更愿意在脚本中使用它。
Additional details:额外细节:
Sorry, I forgot to mention that I am running sed
in Linux (Debian stable)抱歉,我忘了提及我在 Linux(Debian 稳定版)中运行sed
Start small!从小事做起! Here's a simpler test case for what you're doing:这是您正在执行的操作的更简单的测试用例:
a1
b1
b2
a2
Here is your code translated for this test case, trying to insert c1
before the first "b" and c2
after the last:这是您为此测试用例翻译的代码,尝试在第一个“b”之前插入c1
在最后一个“b”之后插入c2
:
sed -ri '/a/N; s/(\nb)/\nc1\1/' file
sed -ri '/b/N; s/(\na)/\nc2\1/' file
The first command, like you say, appears to work:就像你说的,第一个命令似乎有效:
a1
c1
b1
b2
a1
The second does not, and just gives you the same result as above rather than inserting c2
.第二个没有,只是给你与上面相同的结果,而不是插入c2
。
Here's what you probably thought would happen, with incorrect parts in bold:以下是您可能认为会发生的情况,不正确的部分以粗体显示:
a1
is read and printed. a1
被读取并打印。c1
is read and printed. c1
被读取并打印。b1
is read. b1
被读取。
/b/
, and b2
is read with N
.它匹配/b/
,并且b2
用N
读取。\\na
.它不匹配\\na
。b1
is printed b1
被打印b2
is read a second time . b2
被第二次读取。
/b/
, and a
is read with N
.它匹配/b/
,并且a
用N
读取。\\na
.它匹配\\na
。 c2
is appended.附加了c2
。b2\\nc2\\na
is printed. b2\\nc2\\na
被打印出来。Here is what actually happens,这是实际发生的事情,
a1
is read and printed. a1
被读取并打印。c1
is read and printed. c1
被读取并打印。b1
is read. b1
被读取。
/b/
, and b2
is read with N
.它匹配/b/
,并且b2
用N
读取。\\na
.它不匹配\\na
。b1\\nb2
is printed b1\\nb2
被打印a2
is read and printed , because b2
has already been read above. a2
被读取并打印,因为b2
已经在上面被读取。Here's a working command:这是一个工作命令:
sed -ri '/b/ { :b; N; s/\na/\nc2&/; te; P; D; bb; }; :e;' file
In pseudocode -- with roughly corresponding sed part in comments -- this is:在伪代码中——注释中大致对应 sed 部分——这是:
if (input.matches("b")) { // /b/ {
while(true) { // :b
input += "\n" + readline(); // N
if(input.matches("\na")) { // s/\na/ ..
input = input.replace("(\na)", "\nc2\1"); // .. \nc2&/
goto exit; // te
}
print(input.substring(0, input.indexOf('\n')); // P
input = input.substring(input.indexOf('\n') + 1); // D
} // bb
} // }
:exit // :e
Translated back to your data:转换回您的数据:
sed -ri '/^<tr><td><\/td><td class="tdci">/ { :b; N; s/(\n<tr><td><\/td><td>)/\n<tr><td>\ <\/td><\/tr>\1/; te; P; D; bb; }; :e' "$f"
@that other guy's excellent answer shows how to do it with sed
. @那个其他人的优秀答案展示了如何使用sed
做到这一点。
However, sed
can be a brain bender when it comes to problems like these that are somewhat procedural in nature, so here's an awk
solution that is probably easier to understand :然而,当涉及到这些本质上有点程序性的问题时, sed
可能会让人费解,所以这里有一个awk
解决方案,它可能更容易理解:
awk -v blockRegex='^<tr><td><\/td><td class="tdci">' \
-v lineToInsert='<tr><td>\ <\/td><\/tr>' \
'
# Print a line BEFORE the FIRST line matching `blockRegex`.
$0 ~ blockRegex { if (!afterFirst) {print lineToInsert; afterFirst=inBlock=1} }
# Print a line AFTER the LAST (contiguous) line matching `blockRegex`.
inBlock && $0 !~ blockRegex { print lineToInsert; afterFirst=inBlock=0 }
# Print the input line.
{ print }
' \
file
Note that this could be optimized further, but I wanted to keep it simpler to clarify the logic.请注意,这可以进一步优化,但我想让它更简单地阐明逻辑。
blockRegex
is passed in as a variable (with option -v
) to identify blocks of contiguous lines before and after which a line is to be inserted - with the line to be inserted passed in as variable lineToInsert
. blockRegex
作为变量(使用选项-v
)传入,以标识要插入行之前和之后的连续行块-要插入的行作为变量lineToInsert
。$0 ~ blockRegex
matches each line in a block of lines of interest and prints the line to insert if it's the first line in the block, as indicated by status variable afterFirst
; $0 ~ blockRegex
匹配感兴趣的行块中的每一行,如果它是块中的第一行,则打印要插入的行,如状态变量afterFirst
; status variable inBlock
indicates that the line at hand is inside a block of interest.状态变量inBlock
指示手头的行在感兴趣的块内。inBlock && $0 !~ blockRegex
matches the first line after the block of interest and prints the line to insert, then resets the status variables. inBlock && $0 !~ blockRegex
匹配感兴趣的块之后的第一行并打印要插入的行,然后重置状态变量。print
simply prints the input line as is. print
只是按原样打印输入行。 Note that the use of the status variables relies on uninitialized variables in awk
defaulting to 0
(which is treated as false
in a Boolean context; similarly, a non-zero value evaluates as true
).请注意,状态变量的使用依赖于awk
默认为0
未初始化变量(在布尔上下文中被视为false
;类似地,非零值计算为true
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.