Using awk to get unique values from column 1, and sum corresponding values in column 2?

Question

I have a CSV file in the following format, I was told at work this is a "map reduce problem" { Server1,33.23 Server2,43.46 Server3,64.34 Server4,56.89 Server2,33.24 Server1,21.40 Server2,33.46 }

It is several thousand lines long and there are around 80 server names which appear several times each in column 1, and column 2 is Mbs. For every occurance of a server name in column 1 add the corresponding value in column 2. So I am left with a new table with no duplicates in column 1 and just the total sum of Mbs from column 2.

So in case I was unclear - for every occurance of any unique value in column 1, add the corresponding values in column 2. And in the end I'd have.

Server1,TotalMbs Server2,TotalMbs Server3,TotalMbs

I know this can be done with awk but I can't figure out how, I think passing in the value in column 1 and then increment a count in column 2 and keep doing it line by line. It's quite tricky??? My long and inelegant solution will be to create a temp file for each server in a loop then just total column 2 for each file then rm the files at the end but I know it can be done in a one liner with awk.

Answer 1

The following awk script might help you,

$ awk -F'[ |,]'  '{for(i=1;i<=NF;i++)if($i ~ "Server")a[$i]+=$(i+1)}END{for(i in a)printf "%s,%s ",i,a[i];printf "\n"}' input_file
Server3,64.34 Server4,56.89 Server1,54.63 Server2,110.16

If ordered output is required, add BEGIN{PROCINFO["sorted_in"]="@ind_str_asc"} to the BIGIN block,

$ awk -F'[ |,]'  'BEGIN{PROCINFO["sorted_in"]="@ind_str_asc"}{for(i=1;i<=NF;i++)if($i ~ "Server")a[$i]+=$(i+1)}END{for(i in a)printf "%s,%s ",i,a[i];printf "\n"}' input_file
Server1,54.63 Server2,110.16 Server3,64.34 Server4,56.89

The oneliner could also be written like this:

awk -F'[ |,]' '{
    if($i ~ "Server")
      a[$i]+=$(i+1)
} END{
  for(i in a)
    printf "%s,%s ",i,a[i];
  printf "\n"
}' input_file

Brief explanation,

Set " " and "," as the delimeters
Scan each line, find "Server" in each column, and store the value of the next column into the corresponding key of a , ie a[$i]=$(i+1) , if we found.

Answer 2

awk -F',' '{ 
             servers[$1] += $;
           } 
           END {
             for (server in servers) { 
               printf("%s %f\n", server, servers[server]); 
             }
           }'

If you want to filter on specific servers, you can add a '//' match to the first block, to make it only execute on lines that match the condition.

Using awk to get unique values from column 1, and sum corresponding values in column 2?

Question

2 answers

solution1
0 ACCPTED 2018-07-18 06:51:35

solution2
0 2018-07-18 07:39:12

Using awk to get unique values from column 1, and sum corresponding values in column 2?

Question

2 answers

solution1 0 ACCPTED 2018-07-18 06:51:35

solution2 0 2018-07-18 07:39:12

solution1
0 ACCPTED 2018-07-18 06:51:35

solution2
0 2018-07-18 07:39:12