简体   繁体   中英

Group rows by minimum value

The problem is that I have some tables to build and some values are under an specific threshold. For example:

    S1  S2  S3
A   700 367 751
B   354 103 143
C   18  7   6
D   27  11  5
E   3   6   1
F   8   2   9
G   1   3   2

What I desire is to keep the rows where the values contain at least one value is equal or greater than 10 and merge the rows where the values are less than 10 to create a row named "Other (<10)" :

1 - Part of the table with rows with at least one cell containing one value greater than 10 (Row C; value 18):

    S1  S2  S3
A   700 367 751
B   354 103 143
C   18  7   6

2 - Part of the table where any value is greater than 10.

E   3   6   1
F   8   2   9
G   1   3   2

The final table would have the last row with the sum of the columns from E, F, and G, including the "Other(>10)" row name. Like this:

            S1  S2  S3
A           700 367 751
B           354 103 143
C           18  7   6
D           27  11  5
Other(<10)  12  11  12

If you are interested in R solution:

filtered.df <-   rbind( df[ apply(df, 1, function(x){any(x>=10)}), ],
               colSums( df[ apply(df, 1, function(x){all(x< 10)}), ]))

And this would be the output:

> filtered.df

#      [,1] [,2] [,3] 
# [1,]  700  367  751 
# [2,]  354  103  143 
# [3,]   18    7    6 
# [4,]   27   11    5 
# [5,]   12   11   12

Data:

df <- structure(c(700, 354, 18, 27, 3, 8, 1, 367, 103, 7, 11, 6, 2, 3, 751, 143, 6, 5, 1, 9, 2), .Dim = c(7L, 3L))

Update: Including column and row names:


As OP asked, for the column-names and row-names this would be the data:

 df <- structure(c(700, 354, 18, 27, 3, 8, 1, 367, 103, 7, 11, 6, 2, 3, 751, 143, 6, 5, 1, 9, 2), .Dim = c(7L, 3L), .Dimnames = list(    c("A", "B", "C", "D", "E", "F", "G"), c("s1", "s2", "s3")))

And then using the same solution above we'll get:

> filtered.df

#    s1  s2  s3 
# A 700 367 751 
# B 354 103 143 
# C  18   7   6 
# D  27  11   5 
#    12  11  12

You can try this in python:

data = ["700 367 751", "354 103 143", "18  7   6", "27  11  5", "3   6   1", "8   2   9", "1   3   2"]

new_data = [map(int, i.split()) for i in data]

final_data = []

extra_data = [0, 0, 0]

for i in new_data:
   if any(b >= 10 for b in i):
        final_data.append(i)

   else:
       extra_data = [extra_data[c]+b for c, b in enumerate(i)]

final_data.append(extra_data)

print final_data

A Vectorized option for R would be,

ind <- rowSums(df > 10) == 0

rbind(df[!ind,], colSums(df[ind,]))

#   S1  S2  S3
#A 700 367 751
#B 354 103 143
#C  18   7   6
#D  27  11   5
#   12  11  12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM