简体   繁体   中英

Read a column value from previous line and next line but insert them as additional fields in the current line using awk

I hope you can help me out with my problem.

I have an input file with 3 columns of data which looks like this:

Apl_No Act_No Sfx_No 
100    10     0
100    11     1
100    12     2
100    13     3
101    20     0
101    21     1

I need to create an output file which contains the data as in the input and 3 additional fileds in its output. It should look like this:

Apl_No Act_No Sfx_No Crt_Act_No Prs_Act_No Cd_Act_No
100    10     0       -         -          -
100    11     1       10        11         12
100    12     2       11        12         13
100    13     3       12        13         10
101    20     0       -         -          -
101    21     1       20        21         20

Every Apl_No has a set of Act_No that are mapped to it. 3 new fields need to be created: Crt_Act_No Prs_Act_No Cd_Act_No . When the first unique Apl_No is encountered the column values 4, 5 and 6 ( Crt_Act_No Prs_Act_No Cd_Act_No ) need to be dashed out. For every following occurrence of the same Apl_No the Crt_Act_No is the same as the Act_No on the previous line, the Prs_Act_No is same as the Act_No on the current line and the Cd_Act_No is same as the Act_No on the next line. This continues for all the following rows bearing the same Apl_No except for the last row. In the last row the Crt_Act_No and Prs_Act_No is filled in the same way as the above rows but the Cd_Act_No needs to be pulled from the Act_No from the first row when the first unique Apl_No is encountered.

I wish to achieve this using awk. Can anyone please help me out how to go about this.

One solution:

awk '
    ## Print header in first line.
    FNR == 1 {
        printf "%s %s %s %s\n", $0, "Crt_Act_No", "Prs_Act_No", "Cd_Act_No";
        next;
    }

    ## If first field not found in the hash means that it is first unique "Apl_No", so
    ## print line with dashes and save some data for use it later.
    ## "line" variable has the content of the previous iteration. Print it if it is set.
    ! apl[ $1 ] {
        if ( line ) {
            sub( /-/, orig_act, line );
            print line;
            line = "";
        }
        printf "%s %s %s %s\n", $0, "-", "-", "-";
        orig_act = prev_act = $2;
        apl[ $1 ] = 1;
        next;
    }

    ## For all non-unique "Apl_No"... 
    {
        ## If it is the first one after the line with
        ## dashes (line not set) save it is content in "line" and the variable
        ## that I will have to check later ("Act_No"). Note that I leave a dash in last
        ## field to substitute in the following iteration.
        if ( ! line ) {
            line = sprintf( "%s %s %s %s", $0, prev_act, $2, "-" );
            prev_act = $2;
            next;
        }

        ## Now I know the field, so substitute the dash with it, print and repeat
        ## the process with current line.
        sub( /-/, $2, line );
        print line;
        line = sprintf( "%s %s %s %s", $0, prev_act, $2, "-" );
        prev_act = $2;
    }
    END {
        if ( line ) {
            sub( /-/, orig_act, line );
            print line;
        }        
    }
' infile | column -t

That yields:

Apl_No  Act_No  Sfx_No  Crt_Act_No  Prs_Act_No  Cd_Act_No
100     10      0       -           -           -
100     11      1       10          11          12
100     12      2       11          12          13
100     13      3       12          13          10
101     20      0       -           -           -
101     21      1       20          21          20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM