简体   繁体   中英

Find maximum in one row and compare with max in other rows

I am looking for the easiest way to filter my data:

Let's use that one as an example:

structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 
                       24.4, 22.8, 19.2), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6), disp = c(160, 
                                                                                          165, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), hp = c(310, 
                                                                                                                                                     110, 93, 110, 475, 105, 245, 62, 95, 223), drat = c(3.9, 3.9, 
                                                                                                                                                                                                         3.85, 3.08, 3.15, 2.76, 633.21, 3.69, 3.92, 3.92), wt = c(2.62, 
                                                                                                                                                                                                                                                                 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44), qsec = c(16.46, 
                                                                                                                                                                                                                                                                                                                                   17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3), vs = c(0, 
                                                                                                                                                                                                                                                                                                                                                                                                     0, 1, 1, 0, 1, 0, 1, 1, 1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0, 
                                                                                                                                                                                                                                                                                                                                                                                                                                        0), gear = c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4), carb = c(4, 4, 1, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             1, 2, 1, 4, 2, 2, 4)), .Names = c("mpg", "cyl", "disp", "hp", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mark_1", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "Mark_2", "Mark_3", "Tom_1", "Tom_2", "Tim_1", "Greg_1", "Greg_2", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "Greg_3", "Greg_4"), class = "data.frame")

I would like to filter rows with "the same" name and keep only one. Some of the names differ only in a number after _ . These ones should be compared to each other.

The criteria for filter:

I would like to first of all find the maximum in each row and than take these maximas and compare them between rows with same name. The row with highest maxima should be kept

Exptected output if I am not mistaken:

structure(list(mpg = c(21, 18.7, 18.1, 14.3), cyl = c(6, 8, 6, 
8), disp = c(160, 360, 225, 360), hp = c(310, 475, 105, 245), 
    drat = c(3.9, 3.15, 2.76, 633.21), wt = c(2.62, 3.44, 3.46, 
    3.57), qsec = c(16.46, 17.02, 20.22, 15.84), vs = c(0, 0, 
    1, 0), am = c(1, 0, 0, 0), gear = c(4, 3, 3, 3), carb = c(4, 
    2, 1, 4)), .Names = c("mpg", "cyl", "disp", "hp", "drat", 
"wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mark_1", 
"Tom_2", "Tim_1", "Greg_1"), class = "data.frame")

> datas
        mpg cyl disp  hp   drat   wt  qsec vs am gear carb
Mark_1 21.0   6  160 310   3.90 2.62 16.46  0  1    4    4
Tom_2  18.7   8  360 475   3.15 3.44 17.02  0  0    3    2
Tim_1  18.1   6  225 105   2.76 3.46 20.22  1  0    3    1
Greg_1 14.3   8  360 245 633.21 3.57 15.84  0  0    3    4

We split the dataset into a list of data.frame by creating a grouping variable with the row names after removing the substring _ followed by numbers ( \\\\d+ ), then loop through the list , find the max of each row ( pmax ), get the index of the maximum value ( which.max ), use it to subset the rows, and rbind the rows together

do.call(rbind, setNames(lapply(split(df1, sub("_\\d+", "", rownames(df1))),
             function(x) x[which.max(do.call(pmax, x)),]), NULL))
#        mpg cyl disp  hp   drat   wt  qsec vs am gear carb
#Greg_1 14.3   8  360 245 633.21 3.57 15.84  0  0    3    4
#Mark_1 21.0   6  160 310   3.90 2.62 16.46  0  1    4    4
#Tim_1  18.1   6  225 105   2.76 3.46 20.22  1  0    3    1
#Tom_2  18.7   8  360 475   3.15 3.44 17.02  0  0    3    2

Or with ave , we create a logical index. Get the max of each row ( do.call(pmax, df1) ), use the row names after substring as grouping variable, get the max value, do a comparison ( == ) with the elements, convert to logical and subset the rows

df1[with(df1, as.logical(ave(do.call(pmax, df1), sub("_\\d+", "", rownames(df1)), 
                          FUN = function(x) x==max(x)))),]
#        mpg cyl disp  hp   drat   wt  qsec vs am gear carb
#Mark_1 21.0   6  160 310   3.90 2.62 16.46  0  1    4    4
#Tom_2  18.7   8  360 475   3.15 3.44 17.02  0  0    3    2
#Tim_1  18.1   6  225 105   2.76 3.46 20.22  1  0    3    1
#Greg_1 14.3   8  360 245 633.21 3.57 15.84  0  0    3    4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM