獲取 R dplyr 中每列 1 的百分比

Question

我有一個像這樣的DF：

row_id   stn_1 stn_2 stn_3 stn_4 stn_5
1        1     0     1     0     1
2        0     1     0     0     0
3        1     0     0     0     0
4        1     0     1     0     0
5        0     0     0     1     0

我想獲得出現在數據中的 stn 的百分比。 基本上是除 row_id 之外的每一列中 1 的百分比。

預期 output：

stn    percentage
stn_1  .60
stn_2  .20
stn_3  .40
stn_4  .20
stn_5  .20

如何在 dplyr 中執行此操作？

Answer 1

使用dplyr和tidyr ，你可以做

dd %>% 
  summarize(across(-row_id, mean)) %>% 
  pivot_longer(names_to="stn", values_to="percentage", everything())
#   stn   percentage
#   <chr>      <dbl>
# 1 stn_1        0.6
# 2 stn_2        0.2
# 3 stn_3        0.4
# 4 stn_4        0.2
# 5 stn_5        0.2

summarize進行計算，而pivot_longer進行整形。

Answer 2

帶有一點colMeans enframe的 colMeans 怎么樣？ （不是dplyr但可能足夠接近）

library(tibble)
library(dplyr)

df |>
  select(-row_id) |>
  colMeans() |>
  enframe(name = "stn", value = "percentage")

Output：

# A tibble: 5 × 2
  stn     percentage
  <chr>   <dbl>
1 stn_1   0.6
2 stn_2   0.2
3 stn_3   0.4
4 stn_4   0.2
5 stn_5   0.2

數據：

library(readr)

df <- read_table("row_id   stn_1 stn_2 stn_3 stn_4 stn_5
1        1     0     1     0     1
2        0     1     0     0     0
3        1     0     0     0     0
4        1     0     1     0     0
5        0     0     0     1     0")

Answer 3

更新：正如@akrun 所說，我們也可以使用plyr::numcolwise(mean)(df[-1]) %>% gather()

第一個答案：還有一個。 老實說@MrFlick 這個中庸的想法太棒了

library(dplyr)
library(tibble)

df %>% 
  mutate(across(-row_id, ~sum(.)/nrow(df))) %>% 
  t() %>% 
  data.frame() %>% 
  slice(-1) %>% 
  rownames_to_column("stn") %>% 
  select(stn, percentage=X1)

    stn percentage
1 stn_1        0.6
2 stn_2        0.2
3 stn_3        0.4
4 stn_4        0.2
5 stn_5        0.2

獲取 R dplyr 中每列 1 的百分比

問題描述

3 個解決方案

解決方案1
6 2022-07-26 19:29:03

解決方案2
3 2022-07-26 19:31:43

解決方案3
1 2022-07-26 19:50:21

獲取 R dplyr 中每列 1 的百分比

問題描述

3 個解決方案

解決方案1 6 2022-07-26 19:29:03

解決方案2 3 2022-07-26 19:31:43

解決方案3 1 2022-07-26 19:50:21

解決方案1
6 2022-07-26 19:29:03

解決方案2
3 2022-07-26 19:31:43

解決方案3
1 2022-07-26 19:50:21