简体   繁体   中英

How to use awk to get Maximum and Average of scores of all column A pairs with column B?

Say I have this file1.csv consists of column A and column A_score looks like this:

fefa68e,312
wernjnn,432
ew443fs,300

and file2.csv consists of pairs of column B and column A looks like this:

dfaefew,fefa68e
dfaefew,wernjnn
vzcxvvz,ew443fs
ewrwefd,wernjnn
ewrwefd,ew443fs

How can I get a file3.csv to get the maximum of scores of all column A pairs with column B looks like this:

dfaefew,432
vzcxvvz,300
ewrwefd,432

and a file4.csv to get the average of scores of all column A pairs with column B looks like this:

dfaefew,372
vzcxvvz,300
ewrwefd,366

Can awk or anything else do the work? I am using ubuntu.

Thx ahead!

Update:

What if the file2.csv looks like this:

dfaefew,fefa68e,1
dfaefew,wernjnn,1
vzcxvvz,ew443fs,1
ewrwefd,wernjnn,0
ewrwefd,ew443fs,0

The third column could be 1 or 0 and 1 and it is the same for the same column 1 value(dfaefew, vzcxvvz etc) and I want to keep the third column and get output like this:

dfaefew,432,1
vzcxvvz,300,1
ewrwefd,432,0

dfaefew,372,1
vzcxvvz,300,1
ewrwefd,366,0

Here is one way of doing it in awk :

Contents of script.awk :

# Set the input and output field separators to ","
BEGIN { FS = OFS = "," }   

# Processing first file
# Load the first file in hash keyed at column1 having value of column2
NR==FNR { value[$1] = $2; next }   

# Processing second file
# Keep a counter of column1 and add values for column2
{ count[$1]++; values[$1]+=value[$2] }   

# Find the max for each entry of column1
{ entry[$1] = (($1 in entry) && entry[$1]>value[$2] ? entry[$1] : value[$2]) } 

# In the END block traverse through array and print desired output.
END {
     for (max in entry) print (max, entry[max]) > "file3.csv";
     for (key in entry) print (key, values[key]/count[key]) > "file4.csv";
}

Run it like:

awk -f script.awk file1.csv file2.csv

Output:

$ cat file3.csv
vzcxvvz,300
ewrwefd,432
dfaefew,432

$ cat file4.csv
vzcxvvz,300
ewrwefd,366
dfaefew,372

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM