简体   繁体   English

从awk的多个特定列中减去单个最大数

[英]Subtract single largest number from multiple specific columns in awk

I have a comma delimited file that looks like 我有一个逗号分隔的文件,看起来像

R,F,TE,K,G,R
1,0,12,f,1,18
2,1,17,t, ,17
3,1,  , ,1,
4,0,15, ,0,16

There are some items which are missing, also first row is the header which I want to ignore. 有些项目丢失了,第一行是我要忽略的标题。 I wanted to calculate the second smallest number in specific columns and subtract it from all the elements in that column unless the value in the column is the minimum value. 我想计算特定列中的第二个最小数字,并从该列中的所有元素中减去它,除非该列中的值为最小值。 In this example, I want to subtract the second minimum values from columns 3 and 6 in the example. 在此示例中,我想从示例中的第3列和第6列中减去第二个最小值。 So, my final values would be: 因此,我的最终值为:

R,F,TE,K,G,R
1,0,12,f,1,1
2,1, 2,t, ,0
3,1, , ,0,
4,0, 0, ,0,16

I tried individually using single columns and giving hand-coded thresholds to make it second largest by 我尝试单独使用单列并提供手动编码的阈值,以使其达到第二大

awk 'BEGIN {FS=OFS=","; 
};
{ min=1000000; 
 if($3<min && $3 != "" && $3>12) min = $3; 
 if($3>0) $3 = $3-min+1;
 print}
 END{print min}
 ' try1.txt

It finds the min alright but the output is not as expected. 它找到最小的正确值,但输出不符合预期。 There should be an easier way in awk. 在awk中应该有一种更简单的方法。

I'd loop over the file twice, once to find the minima, once to adjust the values. 我要遍历文件两次,一次是找到最小值,一次是调整值。 It's a trade-off of time versus memory. 这是时间与内存的权衡。

awk -F, -v OFS=, '
    NR == 1    {min3 = $3; min6 = $6} 
    NR == FNR  {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
    $3 != min3 {$3 -= min3}
    $6 != min6 {$6 -= min6}
    {print}
' try1.txt try1.txt

For prettier output: 对于更漂亮的输出:

awk -F, -v OFS=, '
    NR == 1    {min3 = $3; min6 = $6; next}
    NR == FNR  {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
    FNR == 1   {len3 = length("" min3); len6 = length("" min6)}
    $3 != min3 {$3 = sprintf("%*d", len3, $3-min3)}
    $6 != min6 {$6 = sprintf("%*d", len6, $6-min6)}
    {print}
' try1.txt try1.txt

Given the new requirements: 鉴于新的要求:

min2_3=$(cut -d, -f3 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
min2_6=$(cut -d, -f6 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')

awk -F, -v OFS=, -v min2_3=$min2_3 -v min2_6=$min2_6 '
    NR==1 {print; next}
    $3 !~ /^ *$/ && $3 >= min2_3 {$3 -= min2_3}
    $6 !~ /^ *$/ && $6 >= min2_6 {$6 -= min2_6}
    {print}
' try1.txt
R,F,TE,K,G,R
1,0,12,f,1,1
2,1,2,t, ,0
3,1,  , ,1,
4,0,0, ,0,16
BEGIN{
    FS=OFS=","
}
{
    if(NR==1){print;next}
    if(+$3)a[NR]=$3
    if(+$6)b[NR]=$6
    s[NR]=$0
}
END{
    asort(a,c)
    asort(b,d)
    for(i=2;i<=NR;i++){
        split(s[i],t)
        if(t[3]!=c[1]&&+t[3]!=0)t[3]=t[3]-c[2]
        if(t[6]!=d[1]&&+t[6]!=0)t[6]=t[6]-d[2]
        print t[1],t[2],t[3],t[4],t[5],t[6]
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM