简体   繁体   English

使用shell脚本合并txt文件的行

[英]merge lines of a txt file using shell script

I invoke a program from shell script and it creates an output file with this format: 我从shell脚本调用一个程序,它用这种格式创建一个输出文件:

aaaaa\
bbbbb\
ccccc\

I would like to change this to: 我想将此更改为:

aaaaabbbbbccccc

In VI editor I can just do ggVGJ and then replace all \\ with "". 在VI编辑器中,我可以执行ggVGJ ,然后将所有\\替换为“”。 But I want to get this done via a script. 但我希望通过脚本完成这项工作。

Here's one way using GNU sed : 这是使用GNU sed的一种方式:

sed ':a; N; $!ba; s/\\\n//g; s/\\$//' file

Another way using awk , may give you better performance: 使用awk另一种方式可能会给你更好的表现:

awk '{ sub ("\\\\$", ""); printf "%s", $0 } END { print "" }' file

Results: 结果:

aaaaabbbbbccccc

Explanation: 说明:

The awk solution removes the trailing backslash (via substitution) and printf's each line (without a newline character). awk解决方案删除了​​尾部反斜杠(通过替换)和printf的每一行(没有换行符)。 END (which is executed at the end of the script) then prints a newline character. END (在脚本末尾执行)然后打印换行符。 This is superior to the sed solution, which creates a label called a and appends the next line of input into the pattern space. 这优于sed解决方案,它创建一个名为a的标签,并将下一行输入附加到模式空间中。 $!ba means 'if not at the last line of input, branch to label a '. $!ba意为“如果不是在输入的最后一行,分行标注a ”。 The first substitution then removes each backslash and newline character from the pattern space. 然后第一个替换从模式空间中删除每个反斜杠和换行符。 The second substitution removes the last, trailing backslash. 第二个替换删除最后一个尾随反斜杠。 This solution should be fast for small files, but probably won't be any faster than the awk for the same file. 对于小文件,此解决方案应该快速,但可能不会比同一文件的awk快。 Although ... it was faster to write. 虽然......写得快。

Here's one way using sed and tr : 这是使用sedtr的一种方式:

sed 's/\\$//' < sample.txt | tr -d '\n'

If you want to add a newline too, you can add an echo at the end: 如果您还想添加换行符,可以在末尾添加echo显:

sed 's/\\$//' < sample.txt | tr -d '\n'; echo

If you want the whole thing to be a one unit, for example to use in a ... && ... || ... 如果你想让整个事物成为一个单位,例如在... && ... || ... ... && ... || ... construct then you can group the two steps like this: ... && ... || ...构造然后你可以像这样分组这两个步骤:

{ sed 's/\\$//' < sample.txt | tr -d '\n'; echo; }

Another way, using pure bash: 另一种方法,使用纯bash:

$ cat file.txt 
aaaaa\
bbbbb\
ccccc\
$ { cat file.txt ; echo; } | while read line; do echo $line; done
aaaaabbbbbccccc
$

This works because the bash read command actually deals with the \\ continuation automatically (use the -r switch to read to disable this behavior). 这是有效的,因为bash read命令实际上自动处理\\ continuation(使用-r开关read以禁用此行为)。 The echo after the cat is necessary for this example because the last line of your sample text ends in \\ , so the read command doesn't think it has got to the end of a line and doesn't output anything. cat之后的echo对于此示例是必需的,因为示例文本的最后一行以\\结尾,因此read命令不会认为它已到达行的末尾并且不输出任何内容。 The echo just inserts an empty line at the end of the stream to clean this up. echo只是在流的末尾插入一个空行来清理它。

I guess this solution is the smallest: 我想这个解决方案是最小的:

$ cat tmp.txt
aaaaa\
bbbbb\
ccccc\

$ cat tmp.txt | tr -d "\\\r\n"
aaaaabbbbbccccc

试试这条线;

awk -F'\\\\$' '{printf "%s", $1}END{print ""}' file

This is a really ugly hack, but you could use the gcc preprocessor : 这是一个非常丑陋的黑客,但你可以使用gcc预处理器

 $ cat file.txt 
 aaaaa\
 bbbbb\
 ccccc\
 $ cat file.txt | gcc -xc -E -P -w - | grep .
 aaaaabbbbbccccc
 $ 

Why is this risky? 为什么这有风险? If your input text happened to contain preprocessor directives, then they would get interpreted, resulting in a mess. 如果您的输入文本恰好包含预处理程序指令,那么它们将被解释,从而导致混乱。

One with awk and sed : 一个用awksed

sed 's/\\$//g' file | awk '{printf "%s", $1}'

sed command removes the slash at the end of the line. sed命令删除行尾的斜杠。 $ denotes the end of the line after a slash. $表示斜线后的行尾。 Since slash is considered as a meta character in sed , you need an extra \\ to escape it. 由于slash被认为是sed的元字符,因此您需要一个额外的\\来逃避它。 piping the output of sed to awk printf prints multiple lines in one. 将sed的输出输出到awk printf多行打印在一起。 $0 represents the entire line. $0代表整条生产线。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM