[英]R: reshaping data to longer format with multiple columns that share pattern in name
我正在努力使用數據集將其從全寬格式轉換為全長格式。 我設法把它變成了介於兩者之間的一種形式。 如下面的玩具示例所示,數據采用基於Cond
列的較長格式。 問題是測量列名稱中的“_Pre”和“_Post”必須是另一個因素,如Cond
,名為PrePost
。 這就是為什么我嘗試的代碼會產生太多行的錯誤結果:
vars_PrePost <- grep("Pre|Post", colnames(df))
df2 <-
df %>%
gather(variable, value, vars_PrePost, -c(ID)) %>%
tidyr::separate(variable, c("variable", "PrePost"), "_(?=[^_]+$)") %>%
spread(variable, value)
這是玩具數據集:
df <- data.frame(stringsAsFactors=FALSE,
ID = c("10", "10", "11", "11", "12", "12"),
Age = c("23", "23", "31", "31", "24", "24"),
Gender = c("m", "m", "m", "m", "f", "f"),
Cond = c("Cond2", "Cond1", "Cond2", "Cond1", "Cond2", "Cond1"),
Measure1_Post = c(NA, "7", NA, "3", NA, "2"),
Measure1_Pre = c(NA, "3", NA, "2", NA, "2"),
Measure2_Post = c("1.3968694273826", "0.799543118218161",
"1.44098109351048", "0.836960160696351",
"1.99568500539374", "1.75138016371597"),
Measure2_Pre = c("1.19248628113128", "0.726244170934944",
"1.01175268267757", "1.26415857605178",
"2.35250186706497", "1.27070245573958"),
Measure3_Post = c("73", "84", "50", "40", "97", "89"),
Measure3_Pre = c("70", "63", "50", "46", "88", "71")
)
所需的 output 應如下所示:
desired_df <- data.frame(stringsAsFactors=FALSE,
Cond = c("Cond2", "Cond2", "Cond1", "Cond1", "Cond2", "Cond2", "Cond1",
"Cond1", "Cond2", "Cond2", "Cond1", "Cond1"),
PrePost = c("Post", "Pre", "Post", "Pre", "Post", "Pre", "Post", "Pre",
"Post", "Pre", "Post", "Pre"),
Measure1 = c(NA, NA, 7, 3, NA, NA, 3, 2, NA, NA, 2, 2),
Measure2 = c(1.3968694273826, 1.19248628113128, 0.799543118218161,
0.726244170934944, 1.44098109351048, 1.01175268267757,
0.836960160696351, 1.26415857605178, 1.99568500539374,
2.35250186706497, 1.75138016371597, 1.27070245573958),
Measure3 = c(73, 70, 84, 63, 50, 50, 40, 46, 97, 88, 89, 71)
)
我想要一個整潔的 / dplyr 解決方案,但任何解決方案都將不勝感激。 謝謝你。
在 tidyr v1.0.0
中使用特殊動詞.value
和names_pattern
我們可以做到
library(tidyr) #v1.0.0
#select columns with _
pivot_longer(df, cols = matches('_'),
names_to = c(".value","PrePost"),
names_pattern = "(.*)_(.*)")
# A tibble: 12 x 8
ID Age Gender Cond PrePost Measure1 Measure2 Measure3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 23 m Cond2 Post NA 1.3968694273826 73
2 10 23 m Cond2 Pre NA 1.19248628113128 70
3 10 23 m Cond1 Post 7 0.799543118218161 84
4 10 23 m Cond1 Pre 3 0.726244170934944 63
...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.