简体   繁体   English

grep,剪切,sed,awk文件的第3列,一次n行,然后粘贴到n行的重复列中?

[英]grep, cut, sed, awk a file for 3rd column, n lines at a time, then paste into repeated columns of n rows?

I have a file of the form: 我有一个形式的文件:

#some header text
a    1       1234
b    2       3333
c    2       1357

#some header text 
a    4       8765
b    1       1212
c    7       9999
...

with repeated data in n-row chunks separated by a blank line (with possibly some other header text). 在n行数据块中以空行分隔的重复数据(可能还有其他一些标题文本)。 I'm only interested in the third column, and would like to do some grep, cut, awk, sed, paste magic to turn it in to this: 我只对第三栏感兴趣,并想做一些grep,cut,awk,sed,粘贴魔术来将其变成这样:

a   1234    8765   ...
b   3333    1212
c   1357    9999

where the third column of each subsequent n-row chunk is tacked on as a new column. 每个后续n行数据块的第三列都作为新列添加。 I guess you could call it a transpose, just n-lines at a time, and only a specific column. 我猜您可以称其为转置,一次仅n行,并且仅特定列。 The leading (abc) column label isn't essential... I'd be happy if I could just grab the data in the third column 前导(abc)列标签不是必不可少的...如果我能在第三列中获取数据,我会很高兴

Is this even possible? 这有可能吗? It must be. 一定是。 I can get things chopped down to only the interesting columns with grep and cut: 我可以使用grep和cut将事情切碎成有趣的列:

cat myfile | grep -A2 ^a\  | cut -c13-15

but I can't figure out how to take these n-row chunks and sed/paste/whatever them into repeated n-row columns. 但我不知道如何将这些n行块以及sed / paste /无论它们放入重复的n行列中。

Any ideas? 有任何想法吗?

This awk does the job: 这个awk完成这项工作:

awk 'NF<3 || /^(#|[[:blank:]]*$)/{next} !a[$1]{b[++k]=$1; a[$1]=$3; next} 
        {a[$1] = a[$1] OFS $3} END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
a 1234 8765
b 3333 1212
c 1357 9999
awk '/#/{next}{a[$1] = a[$1] $3 "\t"}END{for(i in a){print i, a[i]}}' file

Would produce 会产生

a 1234  8765
b 3333  1212
c 1357  9999

You can change "\\t" to a different output separator like " " if you like. 您可以根据需要将"\\t"更改为其他输出分隔符,例如" "

sub(/\\t$/, "", a[i]); may be inserted before printif uf you don't like having trailing spaces. 可能会在printif uf之前插入,您不希望在结尾printif空格。 Another solution is to check if a[$1] already has a value where you decide if you have append to a previous value or not. 另一个解决方案是检查a[$1]已经有一个值,您可以在其中确定是否追加到先前的值。 It complicates the code a bit though. 它会使代码复杂一些。

Using bash > 4.0: 使用bash> 4.0:

declare -A array
while read line
do
   if [[ $line && $line != \#* ]];then
       c=$( echo $line | cut -f 1 -d ' ')
       value=$( echo $line | cut -f 3 -d ' ')
       array[$c]="${array[$c]} $value"
   fi
done < myFile.txt

for k in "${!array[@]}"
do
    echo "$k ${array[$k]}"
done

Will produce: 将产生:

a  1234 8765
b  3333 1212
c  1357 9999

It stores the letter as the key of the associative array, and in each iteration, appends the correspondig value to it. 它将字母存储为关联数组的键,并在每次迭代中将对应值附加到其上。

$ awk -v RS= -F'\n' '{ for (i=2;i<=NF;i++) {split($i,f,/[[:space:]]+/); map[f[1]] = map[f[1]] " " f[3]} } END{ for (key in map) print key map[key]}' file
a 1234 8765
b 3333 1212
c 1357 9999

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM