简体   繁体   English

用 awk 的字符串替换换行符

[英]Substite newlines with a string with awk

I need to parse stdin in the following way:我需要通过以下方式解析标准输入:

(1) all newlines characters must be substituted with \n (a literal \ followed by n ) (1) 所有换行符都必须替换为\n (文字\后跟n

(2) nothing else should be performed except the previous (2) 除了前面的操作外,不需要执行任何其他操作

I chose awk to do it, and I would like an answer that uses awk if possible.我选择了awk来做,如果可能的话,我想要一个使用awk的答案。

I came up with:我想出了:

echo -ne "A\nB\nC" | awk '{a[NR]=$0;} END{for(i=1;i<NR;i++){printf "%s\\n",a[i];};printf "%s",a[NR];}'

But it looks cumbersome.但是看起来很麻烦。

Is there a better / cleaner way?有更好/更清洁的方法吗?

  • Handling malformed files (ie. that don't end with the record separator) with awk is tricky.使用awk处理格式错误的文件(即不以记录分隔符结尾的文件)很棘手。

  • sed -z is GNU specific, and has the side effect of slurping the whole (text) file into RAM (that might be an issue for huge files) sed -z是特定于 GNU 的,并且具有将整个(文本)文件吞入 RAM 的副作用(这可能是大文件的问题)

Thus, for a robust and reasonably portable solution I would use perl :因此,对于一个健壮且合理便携的解决方案,我会使用perl

perl -pe 's/\n/\\n/'

With awk:使用 awk:

echo -ne "A\nB\nC" | awk 'BEGIN{FS="\n"; OFS="\\n"; RS=ORS=""} {$1=$1}1'

Output: Output:

A\nB\nC

See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR请参阅: 8 个强大的 Awk 内置变量 – FS、OFS、RS、ORS、NR、NF、FILENAME、FNR

I would harness GNU AWK for this task following way我将按照以下方式利用 GNU AWK完成此任务

echo -ne "A\nB\nC" | awk '{printf "%s%s",$0,RT?"\\n":""}'

gives output给出 output

A\nB\nC

(without trailing newline) (没有尾随换行符)

Explanation: I do create string to be output based on current line context ( $0 ) and backslash followed by n or empty string depending on RT which is row terminator for current line.说明:我确实根据当前行上下文 ( $0 ) 和反斜杠后跟n或空字符串创建了 output 字符串,具体取决于RT ,RT 是当前行的行终止符。 RT value is newline for all but last lines and empty string for last line, therefore when used in boolean context it is true for all but last line. RT值是除最后一行以外的所有行的换行符和最后一行的空字符串,因此当在 boolean 上下文中使用时,它对除最后一行以外的所有行都是 true。 I used so-called ternary operator here condition ?我在这里使用了所谓的三元运算符条件? valueiftrue : valueiffalse.值为真:值为假。

(tested in GNU Awk 5.0.1) (在 GNU Awk 5.0.1 中测试)

Using GNU awk for multi-char RS:将 GNU awk 用于多字符 RS:

$ echo -ne "A\nB\n\nC" | awk -v RS='^$' -v ORS= -F'\n' -v OFS='\\n' '{$1=$1} 1'
A\nB\n\nC$

this should solve the blank line in between problem:这应该解决问题之间的空白行:

gecho -ne "A\nB\n\nC" | 
 {m,g,n}awk 'BEGIN { RS = "^$"; FS = "\n" ORS = ""; OFS = "\\n" } NF = NF' | gcat -b
     1  A\nB\n\nC%   

a gawk -specific way via RT :通过RTgawk特定方式:

 gawk 'BEGIN { _ = ""; ORS =__= "\\n" } (ORS = RT ? __ : _)^_'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM