如何计算空格分隔文件中行数的总和？

Question

I have space delimited file which is like this: 我有空格分隔文件，如下所示：

probeset_id submitted_id chr snp_pos alleleA alleleB 562_201 562_202 562_203 562_204 562_205 562_206 562_207 562_208 562_209 562_210 562_211 562_212 562_213 562_214 562_215 562_216 562_217 562_218 562_219 562_220 562_221 562_222 562_223 562_224 562_225 562_226 562_227 562_228 562_229 562_230 562_231 562_232 562_233 562_234 562_235 562_236 562_237 562_238 562_239 562_240 562_241 562_242 562_243 562_244 562_245 562_246 562_247 562_248 562_249 562_250 562_251 562_252 562_253 562_254 562_255 562_256 562_257 562_258 562_259 562_260 562_261 562_262 562_263 562_264 562_265 562_266 562_267 562_268 562_269 562_270 562_271 562_272 562_273 562_274 562_275 562_276 562_277 562_278 562_279 562_280 562_281 562_283 562_284 562_285 562_289 562_291 562_292 562_294 562_295 562_296 562_400 562_401 562_402 562_403 562_404 562_405 
AX-75448119 Chr1_41908741 1 41908741 T C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 2 2 0 0 0 0 0 1 0 0 0 0 0 
AX-75448118 Chr1_41908545 1 41908545 T C 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 2 2 2 0 1 1 1 2 -1 1 2 0 0 2 1 1 0 1 0 1 2 1 0 0 1 2 2 1 2 2 0 1 2 2 2 2 2 2 0 1 0 0 0 1 2 2 2 2 0

what I would like to do is to have the sum of all numbers in each row and if there is a negative number (only -1 exist) just ignore it so I would like to have this as result: 我想要做的是得到每行中所有数字的总和，如果有一个负数（只有-1存在），请忽略它，所以我希望得到这个结果：

AX-75448119 Chr1_41908741 1 41908741 T C 13

(which is 1+1+1+1+1+1+1+1+2+2+1) （即1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 2 + 2 + 1）

and 和

AX-75448118 Chr1_41908545 1 41908545 T C 98

which in this case the -1 is ignored! 在这种情况下，-1被忽略！

I was thinking of using awk in Linux which I usually use for space delimited file but I only know how to use it for columns not for rows 我正在考虑在Linux中使用awk，我通常将其用于空格分隔文件，但我只知道如何将它用于列而不是行

Answer 1

This maybe what you're looking for using pure awk . 这可能是你正在寻找使用纯awk 。

awk 'NR >=2 {for (i=7;i<=NF;i++) if ($i !~ /^-/) sum += $i; print $1,$2,$3,$4,$5,$6,sum; sum = 0}' data.txt

Output: 输出：

AX-75448119 Chr1_41908741 1 41908741 T C 13
AX-75448118 Chr1_41908545 1 41908545 T C 98

Answer 2

I would like to suggest a Perl script: 我想建议一个Perl脚本：

#!/usr/bin/env perl
while(<>) {
    my ($line,$sum,$next);
    # repeat while there are two (or more) integers after the "... T C" prefix:
    while (/^(AX-\d+\s+\S+\s+\d+\s+\d+\s+\w+\s+\w+\s+)(\d+)\s+(-?\d+)/) {
        $line = $1;
        $sum = $2;
        $next = $3;
        $sum += $next if ($next > 0);    # do not add negative numbers.
        # replace the two integers by their sum.
        s/$line\d+\s+$next/$line$sum/;
    }
    print;
}

which you can run like: cat data | ./script.pl 您可以像以下一样运行： cat data | ./script.pl cat data | ./script.pl

I get: 我明白了：

probeset_id submitted_id chr snp_pos alleleA alleleB 562_201 562_202 562_203 562_204 562_205 562_206 562_207 562_208 562_209 562_210 562_211 562_212 562_213 562_214 562_215 562_216 562_217 562_218 562_219 562_220 562_221 562_222 562_223 562_224 562_225 562_226 562_227 562_228 562_229 562_230 562_231 562_232 562_233 562_234 562_235 562_236 562_237 562_238 562_239 562_240 562_241 562_242 562_243 562_244 562_245 562_246 562_247 562_248 562_249 562_250 562_251 562_252 562_253 562_254 562_255 562_256 562_257 562_258 562_259 562_260 562_261 562_262 562_263 562_264 562_265 562_266 562_267 562_268 562_269 562_270 562_271 562_272 562_273 562_274 562_275 562_276 562_277 562_278 562_279 562_280 562_281 562_283 562_284 562_285 562_289 562_291 562_292 562_294 562_295 562_296 562_400 562_401 562_402 562_403 562_404 562_405 
AX-75448119 Chr1_41908741 1 41908741 T C 13 
AX-75448118 Chr1_41908545 1 41908545 T C 98

Answer 3

In case you really wanted to avoid perl (why?) you could do this hacky thing, which, obviously, doesn't perform too well: 万一你真的想避免perl（为什么？）你可以做这个hacky的东西，显然，它表现不太好：

while read f1 f2 f3 f4 f5 f6 line
do 
    echo "$f1 $f2 $f3 $f4 $f5 $f6 $(echo "$line" |
            xargs -n1 | grep -v '^-' | paste -sd+ | bc)"
done < input

I get: 我明白了：

AX-75448119 Chr1_41908741 1 41908741 T C 13
AX-75448118 Chr1_41908545 1 41908545 T C 98

Answer 4

Slightly changed version of @steve's awk solution @ steve的awk解决方案略有改动

awk '
NR>1{
        s = 0;
        for (i = 7 ; i <= NF ; i++)
        {
            if ($i != -1)
            {
                s+=$i;
            }
        }
        for (j = 1 ; j < 7 ; j++)
        {
            printf("%s ", $j);
        }
        print s;
}' file

Test: 测试：

[jaypal:~/Temp] awk '
NR>1{
        s = 0;
        for (i = 7 ; i <= NF ; i++)
        {
            if ($i != -1)
            {
                s+=$i;
            }
        }
        for (j = 1 ; j < 7 ; j++)
        {
            printf("%s ", $j);
        }
        print s;
}' file
AX-75448119 Chr1_41908741 1 41908741 T C 13
AX-75448118 Chr1_41908545 1 41908545 T C 98

如何计算空格分隔文件中行数的总和？

问题描述

4 个解决方案

解决方案1
2 已采纳 2012-01-23 12:27:30

解决方案2
1 2012-01-23 10:24:00

解决方案3
1 2012-01-23 11:42:32

解决方案4
1 2012-01-23 14:23:01

Test: 测试：

如何计算空格分隔文件中行数的总和？

问题描述

4 个解决方案

解决方案1 2 已采纳 2012-01-23 12:27:30

解决方案2 1 2012-01-23 10:24:00

解决方案3 1 2012-01-23 11:42:32

解决方案4 1 2012-01-23 14:23:01

Test: 测试：

解决方案1
2 已采纳 2012-01-23 12:27:30

解决方案2
1 2012-01-23 10:24:00

解决方案3
1 2012-01-23 11:42:32

解决方案4
1 2012-01-23 14:23:01