簡體   English   中英

將虛擬變量列分為具有摘要統計信息的兩列

[英]Separate a dummy variable column into two columns with summary statistics

我有一個應該是簡單的問題,但是我無法弄清楚如何在dplyr / tidyr中獲得所需的結果。

我剛剛計算出一個摘要數據框,如下所示:

results <- df_long %>%
  group_by(question,imputed_liberal, question_text) %>% 
  summarize(Accuracy = mean(score, na.rm = T), Reaction_Time = mean(reation_time, na.rm = T), Number = n()) 

每個問題在兩行中重復,一列用於imputed_liberal = T,一列用於imputed_liberal = F,一列用於准確性和react_time。

   question imputed_liberal question_text Accuracy Reaction_Time Number                                                         

 1 10       F               How many...    0.750       61.4     16
 2 10       T               How many...    0.429       55.9     14

我想將這兩行折疊為一個單獨的列(每個問題一行),並在列中輸入“保守准確度”(估算的自由度= F),“自由准確度”,“保守反應時間”和“自由反應時間”。

我認為spread是正確的方法,但無法弄清楚如何在兩個值(准確度和react_time)上進行傳播。

我的嘗試:

results <- results %>% 
           filter(!is.na(Accuracy)) %>%
           spread(results, key = imputed_liberal, value = c(Accuracy, Reaction_time))

拋出錯誤,因為您不能同時傳播兩個值。

一種選擇是將您分為兩個部分,然后將這兩個部分結合在一起。

library(dplyr)

inner_join(filter(results, imputed_liberal), 
    filter(results, !imputed_liberal), by="question") %>%
     select(-Number.y)

結果:

注意:可以根據自己的選擇重命名列。

# question imputed_liberal.x question_text.x Accuracy.x Reaction_Time.x Number.x imputed_liberal.y question_text.y Accuracy.y Reaction_Time.y
# 1       10              TRUE     How many...      0.429            55.9       14             FALSE     How many...       0.75            61.4

數據:

results <- read.table(text =
"question imputed_liberal question_text Accuracy Reaction_Time Number  
1 10       FALSE               'How many...'    0.750       61.4     16
2 10       TRUE               'How many...'    0.429       55.9     14",
header = TRUE, stringsAsFactors = FALSE)

這是標准的tidyr方式:

library(tidyverse)
df %>%
  select(-Number) %>%
  mutate(imputed_liberal = ifelse(imputed_liberal,1,0)) %>%
  gather(,,Accuracy, Reaction_Time) %>%
  unite(key,key,imputed_liberal) %>%
  spread(key,value)

#   question question_text Accuracy_0 Accuracy_1 Reaction_Time_0 Reaction_Time_1
# 1       10   How many...       0.75      0.429            61.4            55.9

您也可以先嵌套,這樣可以減少做體操的次數:

df %>%
  select(-Number) %>%
  nest(Accuracy, Reaction_Time) %>%
  spread(imputed_liberal,data) %>%
  unnest(.sep = "_")

#   question question_text FALSE_Accuracy FALSE_Reaction_Time TRUE_Accuracy TRUE_Reaction_Time
# 1       10   How many...           0.75                61.4         0.429               55.9

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM