R：将数据重塑为更长的格式，多列在名称中共享模式

Question

I am struggling for some time with a dataset to get it from a fully wide format to a fully long format.我正在努力使用数据集将其从全宽格式转换为全长格式。 I managed to get it to a form in between.我设法把它变成了介于两者之间的一种形式。 As in the toy example below, the data is in a longer format based on Cond column.如下面的玩具示例所示，数据采用基于Cond列的较长格式。 The problem is that "_Pre" and "_Post" in the measurement columns' names will have to be another factor like Cond , named PrePost .问题是测量列名称中的“_Pre”和“_Post”必须是另一个因素，如Cond ，名为PrePost 。 This is why the code I tried produces a wrong result with too many rows:这就是为什么我尝试的代码会产生太多行的错误结果：

vars_PrePost <- grep("Pre|Post", colnames(df))

df2 <-
  df %>%
  gather(variable, value, vars_PrePost, -c(ID)) %>%                                      
  tidyr::separate(variable,  c("variable", "PrePost"), "_(?=[^_]+$)") %>%                
  spread(variable, value)

Here is the toy dataset:这是玩具数据集：

df <- data.frame(stringsAsFactors=FALSE,
               ID = c("10", "10", "11", "11", "12", "12"),
           Age = c("23", "23", "31", "31", "24", "24"),
          Gender = c("m", "m", "m", "m", "f", "f"),
         Cond = c("Cond2", "Cond1", "Cond2", "Cond1", "Cond2", "Cond1"),
         Measure1_Post = c(NA, "7", NA, "3", NA, "2"),
          Measure1_Pre = c(NA, "3", NA, "2", NA, "2"),
         Measure2_Post = c("1.3968694273826", "0.799543118218161",
                      "1.44098109351048", "0.836960160696351",
                      "1.99568500539374", "1.75138016371597"),
          Measure2_Pre = c("1.19248628113128", "0.726244170934944",
                      "1.01175268267757", "1.26415857605178",
                      "2.35250186706497", "1.27070245573958"),
    Measure3_Post = c("73", "84", "50", "40", "97", "89"),
     Measure3_Pre = c("70", "63", "50", "46", "88", "71")
)

Desired output should look like this:所需的 output 应如下所示：

desired_df <- data.frame(stringsAsFactors=FALSE,
        Cond = c("Cond2", "Cond2", "Cond1", "Cond1", "Cond2", "Cond2", "Cond1",
                 "Cond1", "Cond2", "Cond2", "Cond1", "Cond1"),
     PrePost = c("Post", "Pre", "Post", "Pre", "Post", "Pre", "Post", "Pre",
                 "Post", "Pre", "Post", "Pre"),
    Measure1 = c(NA, NA, 7, 3, NA, NA, 3, 2, NA, NA, 2, 2),
    Measure2 = c(1.3968694273826, 1.19248628113128, 0.799543118218161,
                 0.726244170934944, 1.44098109351048, 1.01175268267757,
                 0.836960160696351, 1.26415857605178, 1.99568500539374,
                 2.35250186706497, 1.75138016371597, 1.27070245573958),
    Measure3 = c(73, 70, 84, 63, 50, 50, 40, 46, 97, 88, 89, 71)
)

I would love a tidy / dplyr solution for this, but any solution will be appreciated.我想要一个整洁的 / dplyr 解决方案，但任何解决方案都将不胜感激。 Thank you.谢谢你。

Answer 1

Using the special verb .value and names_pattern in tidyr v1.0.0 we can do在 tidyr v1.0.0中使用特殊动词.value和names_pattern我们可以做到

library(tidyr) #v1.0.0
#select columns with _
pivot_longer(df, cols = matches('_'), 
                 names_to = c(".value","PrePost"), 
                 names_pattern = "(.*)_(.*)")

# A tibble: 12 x 8
   ID    Age   Gender Cond  PrePost Measure1 Measure2          Measure3
   <chr> <chr> <chr>  <chr> <chr>   <chr>    <chr>             <chr>   
 1 10    23    m      Cond2 Post    NA       1.3968694273826   73      
 2 10    23    m      Cond2 Pre     NA       1.19248628113128  70      
 3 10    23    m      Cond1 Post    7        0.799543118218161 84      
 4 10    23    m      Cond1 Pre     3        0.726244170934944 63  
 ...

R：将数据重塑为更长的格式，多列在名称中共享模式

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-11-21 10:48:01

R：将数据重塑为更长的格式，多列在名称中共享模式

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-11-21 10:48:01

解决方案1
2 已采纳 2019-11-21 10:48:01