简体   繁体   English

使用 awk 创建表外矩阵

[英]Create a matrix out of table using awk

I want to use this table:我想使用这张表:

a   16  moe max us
b   11  tom mic us
d   14  roe fox au
t   29  ann teo au
n   28  joe joe ca

and make this matrix by using awk (or any other simple option in bash):并使用 awk(或 bash 中的任何其他简单选项)制作此矩阵:

    a_16;   b_11;   d_14;   t_29;   n_28
us; moe_max;    tom_mic;    ;   ;       
au; ;   ;   roe_fox;    ann_teo;    
ca; ;   ;   ;   ;   joe_joe

I tried this but it didn't work:我试过这个但没有用:

awk '{a[$5]=a[$5]?a[$5] FS $1"_"$2:$1"_"$2; b[$5]=b[$5]?b[$5] FS $3"_"$4:$3"_"$4;} END{for (i in a){print i"\t" a[i] "\t" b[i];}}' fis.txt

Using any awk使用任何awk

$ cat tst.awk
{
    row           = $NF
    col           = $1 "_" $2
    vals[row,col] = $3 "_" $4
}

!seenRow[row]++ { rows[++numRows] = row }
!seenCol[col]++ { cols[++numCols] = col }

END {
    OFS = ";  "

    printf "     "
    for ( colNr=1; colNr<=numCols; colNr++ ) {
        col = cols[colNr]
        printf "%s%s", col, (colNr<numCols ? OFS : ORS)
    }

    for ( rowNr=1; rowNr<=numRows; rowNr++ ) {
        row = rows[rowNr]
        printf "%s%s", row, OFS
        for ( colNr=1; colNr<=numCols; colNr++ ) {
            col = cols[colNr]
            #val = ((row,col) in vals ? vals[row,col] : "  ")
            val = vals[row,col]
            printf "%s%s", val, (colNr<numCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
     a_16;  b_11;  d_14;  t_29;  n_28
us;  moe_max;  tom_mic;  ;  ;
au;  ;  ;  roe_fox;  ann_teo;
ca;  ;  ;  ;  ;  joe_joe

I can't see the pattern in the expected output in your question of when there should be 1, 2, 3, or 4 spaces after each ;在你的问题中,我看不到预期的 output 中的模式,即每个后面应该有 1、2、3 或 4 个空格; so I just used a consistent 2 in the above.所以我只是在上面使用了一致的 2。 Massage it to suit.按摩它以适应。

Using gawk multidimensional arrays for collecting header columns and row indices:使用gawk multidimensional arrays 收集 header 列和行索引:

awk '{
    head[NR] = $1"_"$2;
    idx[$5][NR] = $3"_"$4
}
END {
    h = ""; col_size = length(head);
    for (i = 1; i <= col_size; i++) {
        h = sprintf("%s  %s", h, head[i])
    }
    print h;
    for (lab in idx) {
        printf("%s", lab);
        for (i = 1; i <= col_size; i++) {
            v = sprintf("%s;  %s", v, idx[lab][i])
        }
        print v;
        v = "";
    }
}' test.txt

  a_16  b_11  d_14  t_29  n_28
ca;  ;  ;  ;  ;  joe_joe
au;  ;  ;  roe_fox;  ann_teo;  
us;  moe_max;  tom_mic;  ;  ;  

Here is a ruby to do that:这是一个 ruby 来做到这一点:

ruby -e 'd=$<.read.
    split(/\R/).
    map(&:split).
    map{|sa| sa.each_slice(2).map{|ss| ss.join("_") } }.
    group_by{|sa| sa[-1] }

# {"us"=>[["a_16", "moe_max", "us"], ["b_11", "tom_mic", "us"]], "au"=>[["d_14", "roe_fox", "au"], ["t_29", "ann_teo", "au"]], "ca"=>[["n_28", "joe_joe", "ca"]]}

heads=d.values.flatten(1).map{|sa| sa[0]}
# ["a_16", "b_11", "d_14", "t_29", "n_28"]

hsh=Hash.new {|h,k| h[k] = ["\t"]*heads.length}
d.each{|k,v| 
    v.each{|sa| 
        hsh[k][heads.index(sa[0])]="\t#{sa[1]}"
    }
}
puts heads.map{|e| "\t#{e}" }.join(";")
hsh.each{|k,v| puts "#{k};\t#{v.join(";")}"}
' file

Prints:印刷:

    a_16;   b_11;   d_14;   t_29;   n_28
us;     moe_max;    tom_mic;    ;   ;   
au;     ;   ;   roe_fox;    ann_teo;    
ca;     ;   ;   ;   ;   joe_joe

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM