根據另一列的值在列中創建 dataframe 的中位數差異

Question

我有一個 dataframe，它看起來像這樣：

data <- data.frame(id=c(1,2,6,3,7,1,5,7),
 class=c('apple','boy','boy','apple','boy','apple','apple','boy'), 
type=c('type1','type1','type2','type2','type3','type4','type4','type4'), 
col1=c(-0.9,0.8,0.7,-0.6,-0.5,0.4,0.3,0.9), col2=c(-6.9,2.8,0.4,-1.6,-0.8,0.6,0.2,-0.1), 
col3=c(6.7,0.9,0.2,-0.7,-0.8,1.6,3.2,0.1))

id class  type col1 col2 col3
1 apple type1 -0.9 -6.9  6.7
2   boy type1  0.8  2.8  0.9
6   boy type2  0.7  0.4  0.2
3 apple type2 -0.6 -1.6 -0.7
7   boy type3 -0.5 -0.8 -0.8
1 apple type4  0.4  0.6  1.6
5 apple type4  0.3  0.2  3.2
7   boy type4  0.9 -0.1  0.1

我正在嘗試創建一個 dataframe，它具有相同的列（即 col1、col2、col3、...），但其中的值應該是median((data %>% filter(class=="apple"))$col1) - median((data %>% filter(class=="boy"))$col1)等等對於每列的每種type 。

所以，最終的 dataframe 看起來像

  type col1 col2 col3
type1 -0.1 -4.1  3.7
type2  0.7  0.4  0.2
type3 -0.5 -0.8 -0.8
type4  0.4  0.6  1.6

我可以通過創建每種type的單獨數據幀並使用bind_rows()計算兩個類的中位數和 append 向量到空 dataframe 的差異來做到這一點。

但是有沒有更好更簡單的方法來做到這一點呢？

Answer 1

你想要的方法是這樣的：

data %>%
  group_by(type) %>%
  summarize(across(col1:col3, ~ median(.[class=="boy"] - median(.[class=="boy"]))))
# # A tibble: 4 x 4
#   type   col1  col2  col3
#   <chr> <dbl> <dbl> <dbl>
# 1 type1     0     0     0
# 2 type2     0     0     0
# 3 type3     0     0     0
# 4 type4     0     0     0

盡管在這種情況下它將返回所有0 s，因為每個組中只有一個"boy" 。

發布問題編輯，這是更新的代碼和結果：

data %>%
 group_by(type) %>%
 summarize(across(col1:col3, ~ median(.[class=="apple"]) - median(.[class=="boy"])))
# # A tibble: 4 x 4
#   type     col1   col2    col3
#   <chr>   <dbl>  <dbl>   <dbl>
# 1 type1 -1.7    -9.700  5.8   
# 2 type2 -1.3000 -2     -0.9000
# 3 type3 NA      NA     NA     
# 4 type4 -0.55    0.5    2.3

NA是因為type3只有"boy" ，沒有"apple" 。

（至少我們不是在比較"apple"和"orange" ，那會很陳詞濫調；-）

Answer 2

這是獲得解決方案的一種方法：那太難了！

library(dplyr)
data %>% 
    arrange(id) %>% 
    filter(class == "boy" | type=="type1") %>% 
    group_by(type) %>% 
    summarise(across(starts_with("col"), sum))

  type   col1  col2  col3
  <chr> <dbl> <dbl> <dbl>
1 type1  -0.1  -4.1   7.6
2 type2   0.7   0.4   0.2
3 type3  -0.5  -0.8  -0.8
4 type4   0.9  -0.1   0.1

根據另一列的值在列中創建 dataframe 的中位數差異

問題描述

2 個解決方案

解決方案1
2 已采納 2021-08-20 20:07:02

解決方案2
0 2021-08-20 20:20:02

根據另一列的值在列中創建 dataframe 的中位數差異

問題描述

2 個解決方案

解決方案1 2 已采納 2021-08-20 20:07:02

解決方案2 0 2021-08-20 20:20:02

解決方案1
2 已采納 2021-08-20 20:07:02

解決方案2
0 2021-08-20 20:20:02