简体   繁体   English

Perl,sed或awk one-liner来改变文件的格式

[英]Perl, sed, or awk one-liner to change the format of the file

I need advice on how to change the file formatted following way file1: 我需要有关如何按照文件方式更改文件格式的建议:

A       504688
B       jobnameA
A       504690
B       jobnameB
A       504691
B       jobnameC
...

into file2: 到file2:

A       B
504688  jobnameA
504690  jobnameB
504691  jobnameC
...

One solution I could think of is: 我能想到的一个解决方案是:

cat file1 | perl -0777 -p -e 's/\s+B/\t/' | awk '{print $2"\t"$3}'.

But I am wondering if there is more efficient way or already known practice that does this job. 但我想知道是否有更有效的方法或已知的做法来完成这项工作。

 perl -nawe 'print "@F[1 .. $#F]", $F[0] eq "A" ? "\t" : "\n"' < /tmp/ab

Look up the options in perlrun . perlrun查找选项。

Another useful one to add is -l (append newline to print), but not in this case. 另一个有用的添加是-l (追加换行的换行符),但在这种情况下不是。

Assuming your input file is tab separated: 假设您的输入文件是制表符分隔的:

echo $'A\tB'
cut -f2 filename | paste - -

Should be pretty quick because this is exactly what cut and paste were written to do. 应该很快,因为这正是剪切粘贴写的事情。

awk '/^A/{num=$2}/^B/{print num,$2}' file

或者,或者,

awk '{num=$2;getline;print num,$2}' file

Here is an sed solution: 这是一个sed解决方案:

sed -e 'N' -e 's/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file

This version will also print the header at the top: 此版本还将在顶部打印标题:

sed '1{h;s/.*/A\tB/p;g};N;s/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file

Or an alternative: 或者替代方案:

sed -n '/^A\s*/{s///;h};/^B\s*/{s///;H;g;s/\n/\t/p}' file

If your sed does not support semicolons as a command separator for the alternative: 如果你的sed不支持分号作为替代的命令分隔符:

sed -n '
/^A\s*/{       # if the line starts with "A"
s///             # remove the "A" and the whitespace
h                # copy the remainder into the hold space
}              # end if
/^B\s*/{       # if the line starts with "B"
s///             # remove the "B" and the whitespace 
H                # append pattern space to hold space
g                # copy hold space to pattern space
s/\n/\t/p        # replace newline with tab and print
}' file

This version will also print the header at the top: 此版本还将在顶部打印标题:

sed -n '/^A\s*/{s///;h;1s/.*/A\tB/p};/^B\s*/{s///;H;g;s/\n/\t/p}' file

This will work with any header text, not just fixed A and B >> 这适用于任何标题文本,而不仅仅是固定的AB >>

awk '{a=$1;b=$2;getline;if(c!=1){print a,$1;c=1};print b,$2}' file1 >file2

...and it will print also header row ...它还会打印标题行

If you need \\t separator, then use: 如果你需要\\t分隔符,那么使用:

awk '{a=$1;b=$2;getline;if(c!=1){print a"\t"$1;c=1};print b"\t"$2}' file1 >file2

这可能对你有用:

 sed -e '1i\A\tB' -e 'N;s/A\s*\(\S*\).*\nB\s*\(\S*\).*/\1\t\2/' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM