简体   繁体   English

通过文本文件进行AWK / Linux脚本计算

[英]AWK / Linux script calculations from a text file

I wrote an AWK script that read files and multiplies rows by columns and sum them up. 我编写了一个AWK脚本,该脚本读取文件并将行乘以列并将它们相加。 I want to pass several files (any number of files) as an argument in awk script at a time eg A.txt, B.txt, C.txt. 我想一次将几个文件(任意数量的文件)作为awk脚本中的参数传递,例如A.txt,B.txt,C.txt。 I want the AWK script to give me sum of rows and columns. 我希望AWK脚本能给我行和列的总和。 I always want to skip first 5 columns from each text file. 我一直想跳过每个文本文件的前5列。

Each text file can have any number of columns. 每个文本文件可以具有任意数量的列。 There can be several text files inside a folder. 一个文件夹中可以有几个文本文件。

I want to run as: 我想运行为:

awk -f foo.awk A.txt B.txt C.txt

eg 例如

If there are 3 different files A.txt, B.txt, C.txt, sum up multiplication of rows and columns from each 3 files. 如果有3个不同的文件A.txt,B.txt,C.txt,则对每个3个文件的行和列的乘积求和。

Output should be: 输出应为:

No of columns in A.txt: count of columns in A.txt with first 5 columns ignored
No of columns in B.txt: count of columns in B.txt with first 5 columns ignored
No of columns in C.txt: count of columns in C.txt with first 5 columns ignored
Sum of A.txt: rows in A.txt*columns in A.txt
Sum of B.txt: rows in B.txt*columns in B.txt
Sum of C.txt: rows in C.txt*columns in C.txt
Total Sum: A+B+C

Below is (sort of pseudo-code) what I have got so far for foo.awk (it is not working with multiple files): 以下是到目前为止(对于伪代码而言)我对foo.awk的了解(不适用于多个文件):

#!/bin/gawk -f

BEGIN { rows=0; columns=0 }
{
    FS="\t";
    if(/^#COLS/) {
            column=NF-5; #skip first 5 columns
            columns+=column
    }
    if (!/^#/){
            rows++;
            files[FILENAME]++;
    }
}
END {
    for (fname in files) {
            printf ("%'24d rows in %s\n",files[fname],fname);
    }
            printf("No of columns in A.txt= %'d\n", columnsA);
            printf("No of columns in B.txt= %'d\n", columnsB);
            printf("No of columns in C.txt= %'d\n", columnsC);
            sum=columns*rows; # multiply no of rows by column in each file and add them up 
            printf( "Sum of A.txt %d\n", sumA);
            printf( "Sum of B.txt %d\n", sumB);
            printf( "Sum of C.txt %d\n", sumC);   
            printf( "Total sum is %d\n", sum_of_A+B+C);  
}

eg 例如

A.txt:
#ignore this line -- pattern does not match
#ignore this line -- pattern does not match
#COLS   A       B       C       D       E       F       G       H       I 
row1    1       2       3       4       5       6       7       8       9
row2    1       3       3       4       5       6       7       8       9
row3    1       3       3       4       5       6       7       8       9

B.txt:
#ignore this line -- pattern does not match
#ignore this line -- pattern does not match
#COLS   A       B       C       D       E       F       G       H        
row1    1       2       3       4       5       6       7       8       
row2    5       3       3       4       6       6       7       8       
row3    8       3       3       4       5       6       7       8       

C.txt:
#ignore this line -- pattern does not match
#ignore this line -- pattern does not match
#COLS   A       B       C       D       E       F       G       H       I       J
row1    1       2       3       3       5       6       7       8       9       2
row2    7       3       3       4       5       6       7       8       9       7
row3    9       3       3       4       5       6       7       8       9       6
row4    9       3       3       4       5       6       7       8       9       6

output:

No of columns in A.txt: 5
No of columns in B.txt: 4
No of columns in C.txt: 6
Sum of A.txt: 3*5=15
Sum of B.txt: 3*4=12
Sum of C.txt: 4*6=24
Total Sum: 12+9+20 = 51

Thank you. 谢谢。

with plain awk you can do this 用普通的awk你可以做到这一点

$ awk '!/^#/{cols[FILENAME]=NF-5; 
             rows[FILENAME]++} 
         END{for(f in cols) print "No of columns in " f, cols[f]; 
             for(f in cols) 
               {r=rows[f];
                c=cols[f];
                sum+=r*c; 
                sumstr=sumstr?sumstr"+"r*c:r*c; 
                print "Sum of "f ":",r "x" c "=" r*c} 
             print "Total Sum: ", sumstr, "=", sum}' {A,B,C}.txt

No of columns in C.txt 6
No of columns in B.txt 4
No of columns in A.txt 5
Sum of C.txt: 4x6=24
Sum of B.txt: 3x4=12
Sum of A.txt: 3x5=15
Total Sum:  24+12+15 = 51

There is a mismatch in number of columns, are you skipping 5 or 6. Also note that the order of entries is not preserved, can be fixed with gawk sorted-in, or with little extra coding as below... 跳过5或6时,列数不匹配。另外请注意,条目的顺序不保留,可以用gawk排序或下面的少量额外代码来固定...

$ awk 'FNR==1{order[++k]=FILENAME} 
        !/^#/{cols[FILENAME]=NF-5; rows[FILENAME]++} 
          END{for(i=1;i<=k;i++) print "No of columns in " order[i], cols[order[i]]; 
              for(i=1;i<=k;i++) {f=order[i];r=rows[f];c=cols[f];sum+=r*c; sumstr=sumstr?sumstr"+"r*c:r*c; print "Sum of "f ":",r "x" c "=" r*c} 
              print "Total Sum: ", sumstr, "=", sum}' {A,B,C}.txt

No of columns in A.txt 5
No of columns in B.txt 4
No of columns in C.txt 6
Sum of A.txt: 3x5=15
Sum of B.txt: 3x4=12
Sum of C.txt: 4x6=24
Total Sum:  15+12+24 = 51

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM