R function to extract top n scores from a dataframe and find their average using `apply` or dplyr `rowwise`

Question

dataframe 看起來像這樣

df = data.frame(name = c("A","B","C"),
               exam1 = c(2,6,4),
               exam2 = c(3,5,6),
               exam3 = c(5,3,3),
               exam4 = c(1,NA,5))

我想提取每個“名稱”的前 3 個考試分數，並使用apply()或 dplyr rowwise()函數找到它們的平均值。

Answer 1

使用apply ，使用MARGIN = 1 ，循環遍歷數字列上的行， sort ，根據decreasing = TRUE/FALSE獲取head/tail ，並返回base R中的mean

apply(df[-1], 1, FUN = function(x) mean(head(sort(x, decreasing = TRUE), 3)))
[1] 3.333333 4.666667 5.000000

或使用dplyr/rowwise

library(dplyr)
df %>%
  rowwise %>%
  mutate(Mean = mean(head(sort(c_across(where(is.numeric)), 
       decreasing = TRUE), 3))) %>% 
  ungroup
# A tibble: 3 × 6
  name  exam1 exam2 exam3 exam4  Mean
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A         2     3     5     1  3.33
2 B         6     5     3    NA  4.67
3 C         4     6     3     5  5

Answer 2

這是一種使用旋轉和使用top_n的替代方法：這將只返回前 3 個：

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(
    -name,
    names_to = "exam",
    values_to = "value"
  ) %>% 
  group_by(name) %>% 
  top_n(3, value) %>% 
  mutate(mean = mean(value)) %>% 
  pivot_wider(
    names_from = exam, 
    values_from = value
  )

  name   mean exam1 exam2 exam3 exam4
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A      3.33     2     3     5    NA
2 B      4.67     6     5     3    NA
3 C      5        4     6    NA     5

或者：

library(tidyr)
df %>% 
  pivot_longer(
    -name,
    names_to = "exam",
    values_to = "value"
  ) %>% 
  group_by(name) %>% 
  top_n(3, value) %>% 
  summarise(mean = mean(value))

 name   mean
  <chr> <dbl>
1 A      3.33
2 B      4.67
3 C      5

Answer 3

我回到這個問題並嘗試使用基本的 dplyr 操作“df”，這也有效，就像早期帖子中的一些真正有用的解決方案一樣。

df_long <- df %>% 
  pivot_longer(cols = -name,
               names_to = "exam",
               values_to = "score")
df_long %>%
group_by(name) %>% 
  arrange(desc(score)) %>% 
  slice(1:3) %>% 
  summarise(mean_score = mean(score))

@Paul Smith 添加inner_join(df)的好主意

Answer 4

另一種可能的解決方案，基於tidyr::pivot_longer並且不使用rowwise ：

library(tidyverse)

df = data.frame(name = c("A","B","C"),
                exam1 = c(2,6,4),
                exam2 = c(3,5,6),
                exam3 = c(5,3,3),
                exam4 = c(1,NA,5))

df %>% 
  pivot_longer(cols = 2:5, names_to = "names") %>% 
  group_by(name) %>% 
  slice_max(value, n=3) %>% 
  summarise(mean = mean(value)) %>% 
  inner_join(df)

#> Joining, by = "name"
#> # A tibble: 3 × 6
#>   name   mean exam1 exam2 exam3 exam4
#>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A      3.33     2     3     5     1
#> 2 B      4.67     6     5     3    NA
#> 3 C      5        4     6     3     5

Answer 5

我會采用@akrun 並添加na.rm參數，以防萬一您在未來的方法中需要它，最高分可以通過 NA 結果進行搜索。

最終結果將是：

df <- data.frame(name = c("A","B","C"),
                exam1 = c(2,6,4),
                exam2 = c(3,5,6),
                exam3 = c(5,3,3),
                exam4 = c(1,NA,5))

results <- apply(df[-1], 1, FUN = function(x) mean(
head(sort(x, decreasing = TRUE), 3),
na.rm=TRUE))

names(results) <- df$name 

results

結果應如下所示：

> results
       A        B        C 
3.333333 4.666667 5.000000 
>

R function to extract top n scores from a dataframe and find their average using `apply` or dplyr `rowwise`

問題描述

5 個解決方案

解決方案1
2 2022-01-16 18:49:56

解決方案2
1 2022-01-16 18:58:33

解決方案3
1 2022-01-16 19:42:11

解決方案4
0 2022-01-16 19:00:25

解決方案5
0 2022-01-18 17:40:36

R function to extract top n scores from a dataframe and find their average using `apply` or dplyr `rowwise`

問題描述

5 個解決方案

解決方案1 2 2022-01-16 18:49:56

解決方案2 1 2022-01-16 18:58:33

解決方案3 1 2022-01-16 19:42:11

解決方案4 0 2022-01-16 19:00:25

解決方案5 0 2022-01-18 17:40:36

解決方案1
2 2022-01-16 18:49:56

解決方案2
1 2022-01-16 18:58:33

解決方案3
1 2022-01-16 19:42:11

解決方案4
0 2022-01-16 19:00:25

解決方案5
0 2022-01-18 17:40:36