I have a comma delimited file that looks like
R,F,TE,K,G,R
1,0,12,f,1,18
2,1,17,t, ,17
3,1, , ,1,
4,0,15, ,0,16
There are some items which are missing, also first row is the header which I want to ignore. I wanted to calculate the second smallest number in specific columns and subtract it from all the elements in that column unless the value in the column is the minimum value. In this example, I want to subtract the second minimum values from columns 3 and 6 in the example. So, my final values would be:
R,F,TE,K,G,R
1,0,12,f,1,1
2,1, 2,t, ,0
3,1, , ,0,
4,0, 0, ,0,16
I tried individually using single columns and giving hand-coded thresholds to make it second largest by
awk 'BEGIN {FS=OFS=",";
};
{ min=1000000;
if($3<min && $3 != "" && $3>12) min = $3;
if($3>0) $3 = $3-min+1;
print}
END{print min}
' try1.txt
It finds the min alright but the output is not as expected. There should be an easier way in awk.
I'd loop over the file twice, once to find the minima, once to adjust the values. It's a trade-off of time versus memory.
awk -F, -v OFS=, '
NR == 1 {min3 = $3; min6 = $6}
NR == FNR {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
$3 != min3 {$3 -= min3}
$6 != min6 {$6 -= min6}
{print}
' try1.txt try1.txt
For prettier output:
awk -F, -v OFS=, '
NR == 1 {min3 = $3; min6 = $6; next}
NR == FNR {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
FNR == 1 {len3 = length("" min3); len6 = length("" min6)}
$3 != min3 {$3 = sprintf("%*d", len3, $3-min3)}
$6 != min6 {$6 = sprintf("%*d", len6, $6-min6)}
{print}
' try1.txt try1.txt
Given the new requirements:
min2_3=$(cut -d, -f3 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
min2_6=$(cut -d, -f6 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
awk -F, -v OFS=, -v min2_3=$min2_3 -v min2_6=$min2_6 '
NR==1 {print; next}
$3 !~ /^ *$/ && $3 >= min2_3 {$3 -= min2_3}
$6 !~ /^ *$/ && $6 >= min2_6 {$6 -= min2_6}
{print}
' try1.txt
R,F,TE,K,G,R
1,0,12,f,1,1
2,1,2,t, ,0
3,1, , ,1,
4,0,0, ,0,16
BEGIN{
FS=OFS=","
}
{
if(NR==1){print;next}
if(+$3)a[NR]=$3
if(+$6)b[NR]=$6
s[NR]=$0
}
END{
asort(a,c)
asort(b,d)
for(i=2;i<=NR;i++){
split(s[i],t)
if(t[3]!=c[1]&&+t[3]!=0)t[3]=t[3]-c[2]
if(t[6]!=d[1]&&+t[6]!=0)t[6]=t[6]-d[2]
print t[1],t[2],t[3],t[4],t[5],t[6]
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.