简体   繁体   中英

how to keep only rows that have highest value in certain column in R

I have a dataframe that looks like this:

library(tidyverse)

df <- tribble (
  ~Species, ~North, ~South, ~East, ~West,
  "a", 4, 3, 2, 3,
  "b", 2, 3, 4, 5, 
  "C", 2, 3, 3, 3,
  "D", 3, 2, 2, 2
)

I want to filter for species that where the highest value is eg North. In this case, species A and D would be selected. Expected output would be a df with only species A and D in it. I used a workaround like this:

df %>%
group_by(species) %>%
mutate(rowmean = mean(North:West) %>%
filter(North > rowmean) %>%
ungroup() %>%
select(!rowmean)

which seems like a lot of code for a simple task. I cant however find a way to do this more codefriendly? Is there a (preferably tidyverse) way to perform this task in a more clean way?

Kind regards

An easier approach is with max.col in base R . Select the columns that are numeric. Get the column index of each row where the value is max . Check if that is equal to 1 ie the first column (as we selected only from 2nd column onwards) and subset the rows

subset(df, max.col(df[-1], 'first') == 1)
# A tibble: 2 x 5
#  Species North South  East  West
#  <chr>   <dbl> <dbl> <dbl> <dbl>
#1 a           4     3     2     3
#2 D           3     2     2     2

If it is based on the rowwise mean

subset(df, North > rowMeans(df[-1]))

Or if we prefer to use dplyr

library(dplyr)
df %>%
   filter(max.col(cur_data()[-1], 'first') == 1)

Similarly if it based on the rowwise mean

df %>% 
    filter(North > rowMeans(cur_data()[-1]))
# base
df[df$North > rowMeans(df[-1]), ]  
# A tibble: 2 x 5
  Species North South  East  West
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 a           4     3     2     3
2 D           3     2     2     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM