[英]Comparing gather (tidyr) to melt (reshape2)
我喜歡reshape2包,因為它讓生活變得如此輕松。 通常,Hadley 對他以前的軟件包進行了改進,以實現簡化、更快運行的代碼。 我想我應該試一試 tidyr ,從我讀到的內容來看,我認為gather
與reshape2 的melt
非常相似。 但是在閱讀了文檔后,我無法gather
來執行與melt
相同的任務。
數據視圖
這是數據視圖(帖子末尾的dput
形式的實際數據):
teacher yr1.baseline pd yr1.lesson1 yr1.lesson2 yr2.lesson1 yr2.lesson2 yr2.lesson3
1 3 1/13/09 2/5/09 3/6/09 4/27/09 10/7/09 11/18/09 3/4/10
2 7 1/15/09 2/5/09 3/3/09 5/5/09 10/16/09 11/18/09 3/4/10
3 8 1/27/09 2/5/09 3/3/09 4/27/09 10/7/09 11/18/09 3/5/10
代碼
這是melt
方式的代碼,我嘗試gather
。 我怎樣才能讓gather
和melt
做同樣的事情?
library(reshape2); library(dplyr); library(tidyr)
dat %>%
melt(id=c("teacher", "pd"), value.name="date")
dat %>%
gather(key=c(teacher, pd), value=date, -c(teacher, pd))
期望輸出
teacher pd variable date
1 3 2/5/09 yr1.baseline 1/13/09
2 7 2/5/09 yr1.baseline 1/15/09
3 8 2/5/09 yr1.baseline 1/27/09
4 3 2/5/09 yr1.lesson1 3/6/09
5 7 2/5/09 yr1.lesson1 3/3/09
6 8 2/5/09 yr1.lesson1 3/3/09
7 3 2/5/09 yr1.lesson2 4/27/09
8 7 2/5/09 yr1.lesson2 5/5/09
9 8 2/5/09 yr1.lesson2 4/27/09
10 3 2/5/09 yr2.lesson1 10/7/09
11 7 2/5/09 yr2.lesson1 10/16/09
12 8 2/5/09 yr2.lesson1 10/7/09
13 3 2/5/09 yr2.lesson2 11/18/09
14 7 2/5/09 yr2.lesson2 11/18/09
15 8 2/5/09 yr2.lesson2 11/18/09
16 3 2/5/09 yr2.lesson3 3/4/10
17 7 2/5/09 yr2.lesson3 3/4/10
18 8 2/5/09 yr2.lesson3 3/5/10
數據
dat <- structure(list(teacher = structure(1:3, .Label = c("3", "7",
"8"), class = "factor"), yr1.baseline = structure(1:3, .Label = c("1/13/09",
"1/15/09", "1/27/09"), class = "factor"), pd = structure(c(1L,
1L, 1L), .Label = "2/5/09", class = "factor"), yr1.lesson1 = structure(c(2L,
1L, 1L), .Label = c("3/3/09", "3/6/09"), class = "factor"), yr1.lesson2 = structure(c(1L,
2L, 1L), .Label = c("4/27/09", "5/5/09"), class = "factor"),
yr2.lesson1 = structure(c(2L, 1L, 2L), .Label = c("10/16/09",
"10/7/09"), class = "factor"), yr2.lesson2 = structure(c(1L,
1L, 1L), .Label = "11/18/09", class = "factor"), yr2.lesson3 = structure(c(1L,
1L, 2L), .Label = c("3/4/10", "3/5/10"), class = "factor")), .Names = c("teacher",
"yr1.baseline", "pd", "yr1.lesson1", "yr1.lesson2", "yr2.lesson1",
"yr2.lesson2", "yr2.lesson3"), row.names = c(NA, -3L), class = "data.frame")
您的gather
線應如下所示:
dat %>% gather(variable, date, -teacher, -pd)
這說:“收集除teacher
和pd
之外的所有變量,將新的鍵列稱為'變量',將新的值列稱為'date'。”
作為說明,請在help(gather)
頁面上注意以下內容:
...: Specification of columns to gather. Use bare variable names.
Select all variables between x and z with ‘x:z’, exclude y
with ‘-y’. For more options, see the select documentation.
由於這是省略號,因此要收集的列的規范以單獨的(裸名)參數給出。 我們希望收集除teacher
和pd
之外的所有列,因此我們使用-
。
在tidyr 1.0.0中,此任務通過更靈活的pivot_longer()
。
等效語法為
library(tidyr)
dat %>% pivot_longer(cols = -c(teacher, pd), names_to = "variable", values_to = "date")
相應地,它說:“將除teacher
和pd
之外的所有內容再旋轉一次,將新變量列稱為”變量”,將新值列稱為”日期”。
請注意,長數據按先后順序旋轉的前一個數據幀的列的順序返回,這與不同於gather
的順序不同,后者以新的變量列的順序返回。 要重新排列最終的dplyr::arrange()
,請使用dplyr::arrange()
。
我的解決方案
dat%>%
gather(!c(teacher,pd),key=variable,value=date)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.