[英]merge lines of a txt file using shell script
I invoke a program from shell script and it creates an output file with this format: 我从shell脚本调用一个程序,它用这种格式创建一个输出文件:
aaaaa\
bbbbb\
ccccc\
I would like to change this to: 我想将此更改为:
aaaaabbbbbccccc
In VI editor I can just do ggVGJ
and then replace all \\ with "". 在VI编辑器中,我可以执行ggVGJ
,然后将所有\\替换为“”。 But I want to get this done via a script. 但我希望通过脚本完成这项工作。
Here's one way using GNU sed
: 这是使用GNU sed
的一种方式:
sed ':a; N; $!ba; s/\\\n//g; s/\\$//' file
Another way using awk
, may give you better performance: 使用awk
另一种方式可能会给你更好的表现:
awk '{ sub ("\\\\$", ""); printf "%s", $0 } END { print "" }' file
Results: 结果:
aaaaabbbbbccccc
Explanation: 说明:
The awk
solution removes the trailing backslash (via substitution) and printf's each line (without a newline character). awk
解决方案删除了尾部反斜杠(通过替换)和printf的每一行(没有换行符)。 END
(which is executed at the end of the script) then prints a newline character. END
(在脚本末尾执行)然后打印换行符。 This is superior to the sed
solution, which creates a label called a
and appends the next line of input into the pattern space. 这优于sed
解决方案,它创建一个名为a
的标签,并将下一行输入附加到模式空间中。 $!ba
means 'if not at the last line of input, branch to label a
'. $!ba
意为“如果不是在输入的最后一行,分行标注a
”。 The first substitution then removes each backslash and newline character from the pattern space. 然后第一个替换从模式空间中删除每个反斜杠和换行符。 The second substitution removes the last, trailing backslash. 第二个替换删除最后一个尾随反斜杠。 This solution should be fast for small files, but probably won't be any faster than the awk
for the same file. 对于小文件,此解决方案应该快速,但可能不会比同一文件的awk
快。 Although ... it was faster to write. 虽然......写得快。
Here's one way using sed
and tr
: 这是使用sed
和tr
的一种方式:
sed 's/\\$//' < sample.txt | tr -d '\n'
If you want to add a newline too, you can add an echo
at the end: 如果您还想添加换行符,可以在末尾添加echo
显:
sed 's/\\$//' < sample.txt | tr -d '\n'; echo
If you want the whole thing to be a one unit, for example to use in a ... && ... || ...
如果你想让整个事物成为一个单位,例如在... && ... || ...
... && ... || ...
construct then you can group the two steps like this: ... && ... || ...
构造然后你可以像这样分组这两个步骤:
{ sed 's/\\$//' < sample.txt | tr -d '\n'; echo; }
Another way, using pure bash: 另一种方法,使用纯bash:
$ cat file.txt
aaaaa\
bbbbb\
ccccc\
$ { cat file.txt ; echo; } | while read line; do echo $line; done
aaaaabbbbbccccc
$
This works because the bash read
command actually deals with the \\ continuation automatically (use the -r switch to read
to disable this behavior). 这是有效的,因为bash read
命令实际上自动处理\\ continuation(使用-r开关read
以禁用此行为)。 The echo
after the cat
is necessary for this example because the last line of your sample text ends in \\
, so the read command doesn't think it has got to the end of a line and doesn't output anything. cat
之后的echo
对于此示例是必需的,因为示例文本的最后一行以\\
结尾,因此read命令不会认为它已到达行的末尾并且不输出任何内容。 The echo
just inserts an empty line at the end of the stream to clean this up. echo
只是在流的末尾插入一个空行来清理它。
I guess this solution is the smallest: 我想这个解决方案是最小的:
$ cat tmp.txt
aaaaa\
bbbbb\
ccccc\
$ cat tmp.txt | tr -d "\\\r\n"
aaaaabbbbbccccc
试试这条线;
awk -F'\\\\$' '{printf "%s", $1}END{print ""}' file
This is a really ugly hack, but you could use the gcc preprocessor : 这是一个非常丑陋的黑客,但你可以使用gcc预处理器 :
$ cat file.txt
aaaaa\
bbbbb\
ccccc\
$ cat file.txt | gcc -xc -E -P -w - | grep .
aaaaabbbbbccccc
$
Why is this risky? 为什么这有风险? If your input text happened to contain preprocessor directives, then they would get interpreted, resulting in a mess. 如果您的输入文本恰好包含预处理程序指令,那么它们将被解释,从而导致混乱。
One with awk
and sed
: 一个用awk
和sed
:
sed 's/\\$//g' file | awk '{printf "%s", $1}'
sed
command removes the slash at the end of the line. sed
命令删除行尾的斜杠。 $
denotes the end of the line after a slash. $
表示斜线后的行尾。 Since slash
is considered as a meta character in sed
, you need an extra \\
to escape it. 由于slash
被认为是sed
的元字符,因此您需要一个额外的\\
来逃避它。 piping the output of sed to awk printf
prints multiple lines in one. 将sed的输出输出到awk printf
多行打印在一起。 $0
represents the entire line. $0
代表整条生产线。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.