I have a CSV file in the following format, I was told at work this is a "map reduce problem" { Server1,33.23 Server2,43.46 Server3,64.34 Server4,56.89 Server2,33.24 Server1,21.40 Server2,33.46 }
It is several thousand lines long and there are around 80 server names which appear several times each in column 1, and column 2 is Mbs. For every occurance of a server name in column 1 add the corresponding value in column 2. So I am left with a new table with no duplicates in column 1 and just the total sum of Mbs from column 2.
So in case I was unclear - for every occurance of any unique value in column 1, add the corresponding values in column 2. And in the end I'd have.
Server1,TotalMbs Server2,TotalMbs Server3,TotalMbs
I know this can be done with awk but I can't figure out how, I think passing in the value in column 1 and then increment a count in column 2 and keep doing it line by line. It's quite tricky??? My long and inelegant solution will be to create a temp file for each server in a loop then just total column 2 for each file then rm the files at the end but I know it can be done in a one liner with awk.
The following awk
script might help you,
$ awk -F'[ |,]' '{for(i=1;i<=NF;i++)if($i ~ "Server")a[$i]+=$(i+1)}END{for(i in a)printf "%s,%s ",i,a[i];printf "\n"}' input_file
Server3,64.34 Server4,56.89 Server1,54.63 Server2,110.16
If ordered output is required, add BEGIN{PROCINFO["sorted_in"]="@ind_str_asc"}
to the BIGIN block,
$ awk -F'[ |,]' 'BEGIN{PROCINFO["sorted_in"]="@ind_str_asc"}{for(i=1;i<=NF;i++)if($i ~ "Server")a[$i]+=$(i+1)}END{for(i in a)printf "%s,%s ",i,a[i];printf "\n"}' input_file
Server1,54.63 Server2,110.16 Server3,64.34 Server4,56.89
The oneliner could also be written like this:
awk -F'[ |,]' '{
if($i ~ "Server")
a[$i]+=$(i+1)
} END{
for(i in a)
printf "%s,%s ",i,a[i];
printf "\n"
}' input_file
Brief explanation,
a
, ie a[$i]=$(i+1)
, if we found. awk -F',' '{
servers[$1] += $;
}
END {
for (server in servers) {
printf("%s %f\n", server, servers[server]);
}
}'
If you want to filter on specific servers, you can add a '//' match to the first block, to make it only execute on lines that match the condition.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.