[英]Comparing gather (tidyr) to melt (reshape2)
I love the reshape2 package because it made life so doggone easy.我喜欢reshape2包,因为它让生活变得如此轻松。 Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code.
通常,Hadley 对他以前的软件包进行了改进,以实现简化、更快运行的代码。 I figured I'd give tidyr a whirl and from what I read I thought
gather
was very similar to melt
from reshape2 .我想我应该试一试 tidyr ,从我读到的内容来看,我认为
gather
与reshape2 的melt
非常相似。 But after reading the documentation I can't get gather
to do the same task that melt
does.但是在阅读了文档后,我无法
gather
来执行与melt
相同的任务。
Data View数据视图
Here's a view of the data (actual data in dput
form at end of post):这是数据视图(帖子末尾的
dput
形式的实际数据):
teacher yr1.baseline pd yr1.lesson1 yr1.lesson2 yr2.lesson1 yr2.lesson2 yr2.lesson3
1 3 1/13/09 2/5/09 3/6/09 4/27/09 10/7/09 11/18/09 3/4/10
2 7 1/15/09 2/5/09 3/3/09 5/5/09 10/16/09 11/18/09 3/4/10
3 8 1/27/09 2/5/09 3/3/09 4/27/09 10/7/09 11/18/09 3/5/10
Code代码
Here's the code in melt
fashion, my attempt at gather
.这是
melt
方式的代码,我尝试gather
。 How can I make gather
do the same thing as melt
?我怎样才能让
gather
和melt
做同样的事情?
library(reshape2); library(dplyr); library(tidyr)
dat %>%
melt(id=c("teacher", "pd"), value.name="date")
dat %>%
gather(key=c(teacher, pd), value=date, -c(teacher, pd))
Desired Output期望输出
teacher pd variable date
1 3 2/5/09 yr1.baseline 1/13/09
2 7 2/5/09 yr1.baseline 1/15/09
3 8 2/5/09 yr1.baseline 1/27/09
4 3 2/5/09 yr1.lesson1 3/6/09
5 7 2/5/09 yr1.lesson1 3/3/09
6 8 2/5/09 yr1.lesson1 3/3/09
7 3 2/5/09 yr1.lesson2 4/27/09
8 7 2/5/09 yr1.lesson2 5/5/09
9 8 2/5/09 yr1.lesson2 4/27/09
10 3 2/5/09 yr2.lesson1 10/7/09
11 7 2/5/09 yr2.lesson1 10/16/09
12 8 2/5/09 yr2.lesson1 10/7/09
13 3 2/5/09 yr2.lesson2 11/18/09
14 7 2/5/09 yr2.lesson2 11/18/09
15 8 2/5/09 yr2.lesson2 11/18/09
16 3 2/5/09 yr2.lesson3 3/4/10
17 7 2/5/09 yr2.lesson3 3/4/10
18 8 2/5/09 yr2.lesson3 3/5/10
Data数据
dat <- structure(list(teacher = structure(1:3, .Label = c("3", "7",
"8"), class = "factor"), yr1.baseline = structure(1:3, .Label = c("1/13/09",
"1/15/09", "1/27/09"), class = "factor"), pd = structure(c(1L,
1L, 1L), .Label = "2/5/09", class = "factor"), yr1.lesson1 = structure(c(2L,
1L, 1L), .Label = c("3/3/09", "3/6/09"), class = "factor"), yr1.lesson2 = structure(c(1L,
2L, 1L), .Label = c("4/27/09", "5/5/09"), class = "factor"),
yr2.lesson1 = structure(c(2L, 1L, 2L), .Label = c("10/16/09",
"10/7/09"), class = "factor"), yr2.lesson2 = structure(c(1L,
1L, 1L), .Label = "11/18/09", class = "factor"), yr2.lesson3 = structure(c(1L,
1L, 2L), .Label = c("3/4/10", "3/5/10"), class = "factor")), .Names = c("teacher",
"yr1.baseline", "pd", "yr1.lesson1", "yr1.lesson2", "yr2.lesson1",
"yr2.lesson2", "yr2.lesson3"), row.names = c(NA, -3L), class = "data.frame")
Your gather
line should look like: 您的
gather
线应如下所示:
dat %>% gather(variable, date, -teacher, -pd)
This says "Gather all variables except teacher
and pd
, calling the new key column 'variable' and the new value column 'date'." 这说:“收集除
teacher
和pd
之外的所有变量,将新的键列称为'变量',将新的值列称为'date'。”
As an explanation, note the following from the help(gather)
page: 作为说明,请在
help(gather)
页面上注意以下内容:
...: Specification of columns to gather. Use bare variable names.
Select all variables between x and z with ‘x:z’, exclude y
with ‘-y’. For more options, see the select documentation.
Since this is an ellipsis, the specification of columns to gather is given as separate (bare name) arguments. 由于这是省略号,因此要收集的列的规范以单独的(裸名)参数给出。 We wish to gather all columns except
teacher
and pd
, so we use -
. 我们希望收集除
teacher
和pd
之外的所有列,因此我们使用-
。
In tidyr 1.0.0 this task is accomplished with the more flexible pivot_longer()
. 在tidyr 1.0.0中,此任务通过更灵活的
pivot_longer()
。
The equivalent syntax would be 等效语法为
library(tidyr)
dat %>% pivot_longer(cols = -c(teacher, pd), names_to = "variable", values_to = "date")
which says, correspondingly, "pivot everything longer except teacher
and pd
, calling the new variable column "variable" and the new value column "date". 相应地,它说:“将除
teacher
和pd
之外的所有内容再旋转一次,将新变量列称为”变量”,将新值列称为”日期”。
Note that the long data comes back in order firstly of the columns of the previous data frame that were pivoted, unlike from gather
, which came back in the order of the new variable column. 请注意,长数据按先后顺序旋转的前一个数据帧的列的顺序返回,这与不同于
gather
的顺序不同,后者以新的变量列的顺序返回。 To rearrange the resultant tibble, use dplyr::arrange()
. 要重新排列最终的
dplyr::arrange()
,请使用dplyr::arrange()
。
My solution我的解决方案
dat%>%
gather(!c(teacher,pd),key=variable,value=date)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.