使用data.table :: dcast或tidyr将长数据集转换为宽数据集

Question

Given the following data in long format. 以长格式给出以下数据。 Would like to do this for an arbitrary number of timepoints . 想在任意数量的时间点执行此操作。

    dat <- structure(list(srdr_id = c("172507", "172507", "172507", "172507", 
"172619", "172619", "172619", "172619"), arm = c("CBT_Educ", 
"CBT_MI", "CBT_Educ", "CBT_MI", "MI", "Educ", "MI", "Educ"), 
    timepoint = c(0, 0, 3, 3, 0, 0, 3, 3), n = c(102, 103, 100, 
    101, 58, 61, 45, 53), mean = c(37.69, 40.23, 34.53, 31.8, 
    4.6, 4.3, 4.4, 4.1), sd = c(16.06, 14.23, 19.78, 19.67, 2.2, 
    2.2, 2.3, 2.5)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

Long dataset: 长数据集：

  srdr_id arm      timepoint     n  mean    sd
  <chr>   <chr>        <dbl> <dbl> <dbl> <dbl>
1 172507  CBT_Educ         0   102  37.7  16.1
2 172507  CBT_MI           0   103  40.2  14.2
3 172507  CBT_Educ         3   100  34.5  19.8
4 172507  CBT_MI           3   101  31.8  19.7
5 172619  MI               0    58   4.6   2.2
6 172619  Educ             0    61   4.3   2.2
7 172619  MI               3    45   4.4   2.3
8 172619  Educ             3    53   4.1   2.5

I would like to create a wide dataset, such that within each srdr_id and arm the three variables (n, mean and sd) appear in the same row. 我想创建一个宽数据集，以便在每个srdr_id中并设置三个变量（n，mean和sd）出现在同一行中。

  srdr_id arm         n.0  mean.0 sd.0 n.3 mean.3  sd.3

1 172507  CBT_Educ     102  37.7  16.1  100  34.5  19.8
2 172507  CBT_MI       103  40.2  14.2  101  31.8  19.7
5 172619  MI            58   4.6   2.2   45   4.4   2.3
6 172619  Educ          61   4.3   2.2   53   4.1   2.5

The following failed with: 以下失败，原因：

Error in is.formula(formula) : object 'srdr_id' not found is.formula（formula）中的错误：找不到对象“ srdr_id”

data.table::dcast(data = dat, srdr_id + arm, value.var = c(n_analyzed, mean, sd))

Answer 1

A common workflow for this type of situation is gathering all the metrics, renaming them, and then spreading again. 此类情况的常见工作流程是收集所有指标，将其重命名，然后再次传播。 See below: 见下文：

tidyverse: tidyverse：

dat %>%
  gather("measure", "val", n, mean, sd) %>%
  mutate(measure = paste0(measure, ".", timepoint)) %>%
  select(-timepoint) %>%
  spread(measure, val)

# A tibble: 4 x 8
  srdr_id arm      mean.0 mean.3   n.0   n.3  sd.0  sd.3
  <chr>   <chr>     <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 172507  CBT_Educ   37.7   34.5   102   100  16.1  19.8
2 172507  CBT_MI     40.2   31.8   103   101  14.2  19.7
3 172619  Educ        4.3    4.1    61    53   2.2   2.5
4 172619  MI          4.6    4.4    58    45   2.2   2.3

data.table: data.table：

library(data.table)

dt <- as.data.table(dat)

melt(dt, id.vars = c("srdr_id", "arm", "timepoint"))[
  ,`:=`(variable = paste0(variable, ".", timepoint), timepoint = NULL)
  ] %>%
  dcast(srdr_id + arm ~ variable, value.var = "value")

   srdr_id      arm mean.0 mean.3 n.0 n.3  sd.0  sd.3
1:  172507 CBT_Educ  37.69  34.53 102 100 16.06 19.78
2:  172507   CBT_MI  40.23  31.80 103 101 14.23 19.67
3:  172619     Educ   4.30   4.10  61  53  2.20  2.50
4:  172619       MI   4.60   4.40  58  45  2.20  2.30

Answer 2

One alternative (probably not the most elegant), is to use group_by() and summarise() from the library dplyr . 一种替代方法（可能不是最优雅的方法）是使用库dplyr中的 group_by()和summarise() 。 Here, you don't have to make some calculations (all values are already in your inital dataset), so you can use functions like first() and last() to specify with values you want. 在这里，您不必进行任何计算（所有值都已经在您的初始数据集中），因此您可以使用first()和last()类的函数来指定所需的值。

dat %>% 
  group_by(srdr_id, arm) %>% 
  summarise(
    n0 = first(n),     mean0 = first(mean),    sd0 = first(sd), 
    n3 = last(n),      mean3 = last(mean),     sd3 = last(sd)
  )

#   srdr_id arm         n0 mean0   sd0    n3 mean3   sd3
#   <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 172507  CBT_Educ   102  37.7  16.1   100  34.5  19.8
# 2 172507  CBT_MI     103  40.2  14.2   101  31.8  19.7
# 3 172619  Educ        61   4.3   2.2    53   4.1   2.5
# 4 172619  MI          58   4.6   2.2    45   4.4   2.3

使用data.table :: dcast或tidyr将长数据集转换为宽数据集

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-03-01 18:10:57

tidyverse: tidyverse：

data.table: data.table：

解决方案2
1 2019-03-01 17:57:27

使用data.table :: dcast或tidyr将长数据集转换为宽数据集

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-03-01 18:10:57

tidyverse: tidyverse：

data.table: data.table：

解决方案2 1 2019-03-01 17:57:27

解决方案1
3 已采纳 2019-03-01 18:10:57

解决方案2
1 2019-03-01 17:57:27