简体   繁体   English

比较收集 (tidyr) 到融化 (reshape2)

[英]Comparing gather (tidyr) to melt (reshape2)

I love the reshape2 package because it made life so doggone easy.我喜欢reshape2包,因为它让生活变得如此轻松。 Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code.通常,Hadley 对他以前的软件包进行了改进,以实现简化、更快运行的代码。 I figured I'd give tidyr a whirl and from what I read I thought gather was very similar to melt from reshape2 .我想我应该试一试 tidyr ,从我读到的内容来看,我认为gatherreshape2 的melt非常相似。 But after reading the documentation I can't get gather to do the same task that melt does.但是在阅读了文档后,我无法gather来执行与melt相同的任务。

Data View数据视图

Here's a view of the data (actual data in dput form at end of post):这是数据视图(帖子末尾的dput形式的实际数据):

  teacher yr1.baseline     pd yr1.lesson1 yr1.lesson2 yr2.lesson1 yr2.lesson2 yr2.lesson3
1       3      1/13/09 2/5/09      3/6/09     4/27/09     10/7/09    11/18/09      3/4/10
2       7      1/15/09 2/5/09      3/3/09      5/5/09    10/16/09    11/18/09      3/4/10
3       8      1/27/09 2/5/09      3/3/09     4/27/09     10/7/09    11/18/09      3/5/10

Code代码

Here's the code in melt fashion, my attempt at gather .这是melt方式的代码,我尝试gather How can I make gather do the same thing as melt ?我怎样才能让gathermelt做同样的事情?

library(reshape2); library(dplyr); library(tidyr)

dat %>% 
   melt(id=c("teacher", "pd"), value.name="date") 

dat %>% 
   gather(key=c(teacher, pd), value=date, -c(teacher, pd)) 

Desired Output期望输出

   teacher     pd     variable     date
1        3 2/5/09 yr1.baseline  1/13/09
2        7 2/5/09 yr1.baseline  1/15/09
3        8 2/5/09 yr1.baseline  1/27/09
4        3 2/5/09  yr1.lesson1   3/6/09
5        7 2/5/09  yr1.lesson1   3/3/09
6        8 2/5/09  yr1.lesson1   3/3/09
7        3 2/5/09  yr1.lesson2  4/27/09
8        7 2/5/09  yr1.lesson2   5/5/09
9        8 2/5/09  yr1.lesson2  4/27/09
10       3 2/5/09  yr2.lesson1  10/7/09
11       7 2/5/09  yr2.lesson1 10/16/09
12       8 2/5/09  yr2.lesson1  10/7/09
13       3 2/5/09  yr2.lesson2 11/18/09
14       7 2/5/09  yr2.lesson2 11/18/09
15       8 2/5/09  yr2.lesson2 11/18/09
16       3 2/5/09  yr2.lesson3   3/4/10
17       7 2/5/09  yr2.lesson3   3/4/10
18       8 2/5/09  yr2.lesson3   3/5/10

Data数据

dat <- structure(list(teacher = structure(1:3, .Label = c("3", "7", 
    "8"), class = "factor"), yr1.baseline = structure(1:3, .Label = c("1/13/09", 
    "1/15/09", "1/27/09"), class = "factor"), pd = structure(c(1L, 
    1L, 1L), .Label = "2/5/09", class = "factor"), yr1.lesson1 = structure(c(2L, 
    1L, 1L), .Label = c("3/3/09", "3/6/09"), class = "factor"), yr1.lesson2 = structure(c(1L, 
    2L, 1L), .Label = c("4/27/09", "5/5/09"), class = "factor"), 
        yr2.lesson1 = structure(c(2L, 1L, 2L), .Label = c("10/16/09", 
        "10/7/09"), class = "factor"), yr2.lesson2 = structure(c(1L, 
        1L, 1L), .Label = "11/18/09", class = "factor"), yr2.lesson3 = structure(c(1L, 
        1L, 2L), .Label = c("3/4/10", "3/5/10"), class = "factor")), .Names = c("teacher", 
    "yr1.baseline", "pd", "yr1.lesson1", "yr1.lesson2", "yr2.lesson1", 
    "yr2.lesson2", "yr2.lesson3"), row.names = c(NA, -3L), class = "data.frame")

Your gather line should look like: 您的gather线应如下所示:

dat %>% gather(variable, date, -teacher, -pd)

This says "Gather all variables except teacher and pd , calling the new key column 'variable' and the new value column 'date'." 这说:“收集除teacherpd之外的所有变量,将新的键列称为'变量',将新的值列称为'date'。”


As an explanation, note the following from the help(gather) page: 作为说明,请在help(gather)页面上注意以下内容:

 ...: Specification of columns to gather. Use bare variable names.
      Select all variables between x and z with ‘x:z’, exclude y
      with ‘-y’. For more options, see the select documentation.

Since this is an ellipsis, the specification of columns to gather is given as separate (bare name) arguments. 由于这是省略号,因此要收集的列的规范以单独的(裸名)参数给出。 We wish to gather all columns except teacher and pd , so we use - . 我们希望收集除teacherpd之外的所有列,因此我们使用-

In tidyr 1.0.0 this task is accomplished with the more flexible pivot_longer() . 在tidyr 1.0.0中,此任务通过更灵活的pivot_longer()

The equivalent syntax would be 等效语法为

library(tidyr)
dat %>% pivot_longer(cols = -c(teacher, pd), names_to = "variable", values_to = "date")

which says, correspondingly, "pivot everything longer except teacher and pd , calling the new variable column "variable" and the new value column "date". 相应地,它说:“将除teacherpd之外的所有内容再旋转一次,将新变量列称为”变量”,将新值列称为”日期”。

Note that the long data comes back in order firstly of the columns of the previous data frame that were pivoted, unlike from gather , which came back in the order of the new variable column. 请注意,长数据按先后顺序旋转的前一个数据帧的列的顺序返回,这与不同于gather的顺序不同,后者以新的变量列的顺序返回。 To rearrange the resultant tibble, use dplyr::arrange() . 要重新排列最终的dplyr::arrange() ,请使用dplyr::arrange()

My solution我的解决方案

    dat%>%
    gather(!c(teacher,pd),key=variable,value=date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM