简体   繁体   中英

grep, cut, sed, awk a file for 3rd column, n lines at a time, then paste into repeated columns of n rows?

I have a file of the form:

#some header text
a    1       1234
b    2       3333
c    2       1357

#some header text 
a    4       8765
b    1       1212
c    7       9999
...

with repeated data in n-row chunks separated by a blank line (with possibly some other header text). I'm only interested in the third column, and would like to do some grep, cut, awk, sed, paste magic to turn it in to this:

a   1234    8765   ...
b   3333    1212
c   1357    9999

where the third column of each subsequent n-row chunk is tacked on as a new column. I guess you could call it a transpose, just n-lines at a time, and only a specific column. The leading (abc) column label isn't essential... I'd be happy if I could just grab the data in the third column

Is this even possible? It must be. I can get things chopped down to only the interesting columns with grep and cut:

cat myfile | grep -A2 ^a\  | cut -c13-15

but I can't figure out how to take these n-row chunks and sed/paste/whatever them into repeated n-row columns.

Any ideas?

This awk does the job:

awk 'NF<3 || /^(#|[[:blank:]]*$)/{next} !a[$1]{b[++k]=$1; a[$1]=$3; next} 
        {a[$1] = a[$1] OFS $3} END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
a 1234 8765
b 3333 1212
c 1357 9999
awk '/#/{next}{a[$1] = a[$1] $3 "\t"}END{for(i in a){print i, a[i]}}' file

Would produce

a 1234  8765
b 3333  1212
c 1357  9999

You can change "\\t" to a different output separator like " " if you like.

sub(/\\t$/, "", a[i]); may be inserted before printif uf you don't like having trailing spaces. Another solution is to check if a[$1] already has a value where you decide if you have append to a previous value or not. It complicates the code a bit though.

Using bash > 4.0:

declare -A array
while read line
do
   if [[ $line && $line != \#* ]];then
       c=$( echo $line | cut -f 1 -d ' ')
       value=$( echo $line | cut -f 3 -d ' ')
       array[$c]="${array[$c]} $value"
   fi
done < myFile.txt

for k in "${!array[@]}"
do
    echo "$k ${array[$k]}"
done

Will produce:

a  1234 8765
b  3333 1212
c  1357 9999

It stores the letter as the key of the associative array, and in each iteration, appends the correspondig value to it.

$ awk -v RS= -F'\n' '{ for (i=2;i<=NF;i++) {split($i,f,/[[:space:]]+/); map[f[1]] = map[f[1]] " " f[3]} } END{ for (key in map) print key map[key]}' file
a 1234 8765
b 3333 1212
c 1357 9999

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM