[英]R Melt reshape data
Here' my data: 这是我的数据:
Day Morning_1_id Var1 Morning_2_id Var2 Afternoon_1_id Var3 Afternoon_2_id Var4
1 20180501-033-000001 3.156667 20180501-033-000002 2.866667 20180501-033-000008 2.946667 20180501-033-000009 3.133333
2 20180502-033-000001 2.986667 20180502-033-000002 2.930000 20180502-033-000020 3.076667 20180502-033-000021 3.013333
3 20180503-033-000001 3.073333 20180503-033-000002 3.070000 20180503-033-000011 3.106667 20180503-033-000012 2.900000
4 20180507-033-000001 3.236667 20180507-033-000002 2.990000 20180507-033-000015 3.043333 20180507-033-000016 3.116667
5 20180508-033-000001 3.030000 20180508-033-000002 3.150000 20180508-033-000015 3.156667 20180508-033-000017 3.343333
6 20180509-033-000001 3.010000 20180509-033-000002 3.020000 20180509-033-000007 3.000000 20180509-033-000008 3.156667
7 20180510-033-000001 2.916667 20180510-033-000002 3.103333 20180510-033-000007 3.336667 20180510-033-000008 3.066667
8 20180511-033-000001 3.293333 20180511-033-000002 3.163333 20180511-033-000013 2.980000 20180511-033-000014 2.940000
9 20180514-033-000001 3.136667 20180514-033-000002 3.186667 20180514-033-000007 2.766667 20180514-033-000008 3.100000
10 20180516-033-000001 3.116667 20180516-033-000002 3.283333 20180516-033-000008 3.133333 20180516-033-000009 3.040000
11 20180517-033-000003 2.843333 20180517-033-000004 3.120000 20180517-033-000008 3.060000 20180517-033-000009 3.033333
12 20180518-033-000001 3.033333 20180518-033-000002 3.290000 20180518-033-000007 3.006667 20180518-033-000008 2.973333
13 20180521-033-000002 3.173333 20180521-033-000003 2.993333 20180521-033-000008 2.983333 20180521-033-000009 3.020000
14 20180523-033-000001 3.336667 20180523-033-000002 3.026667 20180523-033-000007 3.300000 20180523-033-000008 3.210000
Reproducible form: 可复制形式:
structure(list(Day = 1:14, Morning_1_id = structure(1:14, .Label = c("20180501-033-000001",
"20180502-033-000001", "20180503-033-000001", "20180507-033-000001",
"20180508-033-000001", "20180509-033-000001", "20180510-033-000001",
"20180511-033-000001", "20180514-033-000001", "20180516-033-000001",
"20180517-033-000003", "20180518-033-000001", "20180521-033-000002",
"20180523-033-000001"), class = "factor"), Var1 = c(3.156666667,
2.986666667, 3.073333333, 3.236666667, 3.03, 3.01, 2.916666667,
3.293333333, 3.136666667, 3.116666667, 2.843333333, 3.033333333,
3.173333333, 3.336666667), Morning_2_id = structure(1:14, .Label = c("20180501-033-000002",
"20180502-033-000002", "20180503-033-000002", "20180507-033-000002",
"20180508-033-000002", "20180509-033-000002", "20180510-033-000002",
"20180511-033-000002", "20180514-033-000002", "20180516-033-000002",
"20180517-033-000004", "20180518-033-000002", "20180521-033-000003",
"20180523-033-000002"), class = "factor"), Var2 = c(2.866666667,
2.93, 3.07, 2.99, 3.15, 3.02, 3.103333333, 3.163333333, 3.186666667,
3.283333333, 3.12, 3.29, 2.993333333, 3.026666667), Afternoon_1_id = structure(1:14, .Label = c("20180501-033-000008",
"20180502-033-000020", "20180503-033-000011", "20180507-033-000015",
"20180508-033-000015", "20180509-033-000007", "20180510-033-000007",
"20180511-033-000013", "20180514-033-000007", "20180516-033-000008",
"20180517-033-000008", "20180518-033-000007", "20180521-033-000008",
"20180523-033-000007"), class = "factor"), Var3 = c(2.946666667,
3.076666667, 3.106666667, 3.043333333, 3.156666667, 3, 3.336666667,
2.98, 2.766666667, 3.133333333, 3.06, 3.006666667, 2.983333333,
3.3), Afternoon_2_id = structure(1:14, .Label = c("20180501-033-000009",
"20180502-033-000021", "20180503-033-000012", "20180507-033-000016",
"20180508-033-000017", "20180509-033-000008", "20180510-033-000008",
"20180511-033-000014", "20180514-033-000008", "20180516-033-000009",
"20180517-033-000009", "20180518-033-000008", "20180521-033-000009",
"20180523-033-000008"), class = "factor"), Var4 = c(3.133333333,
3.013333333, 2.9, 3.116666667, 3.343333333, 3.156666667, 3.066666667,
2.94, 3.1, 3.04, 3.033333333, 2.973333333, 3.02, 3.21)), class = "data.frame", row.names = c(NA,
-14L))
Here's what I want it to be: 这是我想要的:
Day Id Var Time
1 20180501-033-000001 3.156666667 Morning1
2 20180502-033-000001 2.986666667 Morning1
3 20180503-033-000001 3.073333333 Morning1
4 20180507-033-000001 3.236666667 Morning1
5 20180508-033-000001 3.03 Morning1
6 20180509-033-000001 3.01 Morning1
7 20180510-033-000001 2.916666667 Morning1
8 20180511-033-000001 3.293333333 Morning1
9 20180514-033-000001 3.136666667 Morning1
10 20180516-033-000001 3.116666667 Morning1
11 20180517-033-000003 2.843333333 Morning1
12 20180518-033-000001 3.033333333 Morning1
13 20180521-033-000002 3.173333333 Morning1
14 20180523-033-000001 3.336666667 Morning1
1 20180501-033-000002 2.866666667 Morning2
2 20180502-033-000002 2.93 Morning2
3 20180503-033-000002 3.07 Morning2
4 20180507-033-000002 2.99 Morning2
5 20180508-033-000002 3.15 Morning2
6 20180509-033-000002 3.02 Morning2
7 20180510-033-000002 3.103333333 Morning2
8 20180511-033-000002 3.163333333 Morning2
9 20180514-033-000002 3.186666667 Morning2
10 20180516-033-000002 3.283333333 Morning2
11 20180517-033-000004 3.12 Morning2
12 20180518-033-000002 3.29 Morning2
13 20180521-033-000003 2.993333333 Morning2
14 20180523-033-000002 3.026666667 Morning2
1 20180501-033-000008 2.946666667 Afternoon1
2 20180502-033-000020 3.076666667 Afternoon1
3 20180503-033-000011 3.106666667 Afternoon1
4 20180507-033-000015 3.043333333 Afternoon1
5 20180508-033-000015 3.156666667 Afternoon1
6 20180509-033-000007 3 Afternoon1
7 20180510-033-000007 3.336666667 Afternoon1
8 20180511-033-000013 2.98 Afternoon1
9 20180514-033-000007 2.766666667 Afternoon1
10 20180516-033-000008 3.133333333 Afternoon1
11 20180517-033-000008 3.06 Afternoon1
12 20180518-033-000007 3.006666667 Afternoon1
13 20180521-033-000008 2.983333333 Afternoon1
14 20180523-033-000007 3.3 Afternoon1
1 20180501-033-000009 3.133333333 Afternoon2
2 20180502-033-000021 3.013333333 Afternoon2
3 20180503-033-000012 2.9 Afternoon2
4 20180507-033-000016 3.116666667 Afternoon2
5 20180508-033-000017 3.343333333 Afternoon2
6 20180509-033-000008 3.156666667 Afternoon2
7 20180510-033-000008 3.066666667 Afternoon2
8 20180511-033-000014 2.94 Afternoon2
9 20180514-033-000008 3.1 Afternoon2
10 20180516-033-000009 3.04 Afternoon2
11 20180517-033-000009 3.033333333 Afternoon2
12 20180518-033-000008 2.973333333 Afternoon2
13 20180521-033-000009 3.02 Afternoon2
14 20180523-033-000008 3.21 Afternoon2
I want to do wide-to-long conversion such that the Ids and the values of 'Var' get stacked day wise. 我想进行从宽到长的转换,以使Ids和'Var'的值逐日堆积。 I also want an additional column named 'Time', which will depend upon the initial ids, namely 'Morning_1_id', 'Morning_2_id', 'Afternoon_1_id' and 'Afternoon_2_id'. 我还想要一个名为“时间”的附加列,该列将取决于初始ID,即“ Morning_1_id”,“ Morning_2_id”,“ Afternoon_1_id”和“ Afternoon_2_id”。 How to do this? 这个怎么做? I tried using melt from reshape2 but couldn't get it done. 我尝试使用来自reshape2的融合,但无法完成。
Here is a solution using dplyr
to transform your table into the requested format: 这是使用dplyr
将表转换为所需格式的解决方案:
library(dplyr)
mydata<- reshape(mydata, direction='long',
varying=c('Morning_1_id', 'Var1', 'Morning_2_id', 'Var2', 'Afternoon_1_id', 'Var3', 'Afternoon_2_id', 'Var4'),
timevar='Var',
times=c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'),
v.names=c('Id', 'Var'),
idvar='Day')
mydata<- tibble::rownames_to_column(mydata)
mydata$rowname<- gsub("^.*\\.","", mydata$rowname)
names(mydata)<- c("Time", "Day", "Var", "Id")
mydata<- mydata[,c(2,4,3,1)]
Here is a tidyverse
option 这是一个tidyverse
选项
corrected per comments from @Calum You 已根据@Calum You的评论更正
df %>%
gather(Time, Var, -Day, -c(Var1, Var2, Var3, Var4)) %>%
mutate(Time = gsub('.{3}$', '',Time),
start = substr(Time, 1, 1),
end = substr(Time, nchar(Time), nchar(Time)),
id = paste0(start,end),
Val = case_when(id=='M1' ~ Var1,
id=='M2' ~ Var2,
id=='A1' ~ Var3,
id=='A2' ~ Var4)) %>%
dplyr::select(Day, Id=Var, Val, Time)
Original incorrect code 原始不正确的代码
df %>%
gather(Time, Var, -Day, -c(Var1, Var2, Var3, Var4)) %>%
gather( key, value, -Day, -Time, -Var) %>%
mutate(Time = gsub('.{3}$', '',Time)) %>%
dplyr::select(Day, Id=Var, Var=value, Time)
Consider base R by building a list of every 2nd column of a sequence and then row binding all df elements: 通过建立序列的第二列的列表,然后行绑定所有df元素,来考虑基数R:
df_list <- lapply(seq(3, length(df), 2), function(i) {
sub <- df[c(1, (i-1):i)] # SUBSET BY COLS
sub <- transform(sub, Time = sub("_id", "", names(df)[i-1])) # ADD TIME VAR
setNames(sub, c("Day", "Id", "Var", "Time")) # RENAME COLS
})
long_df <- do.call(rbind, df_list)
head(long_df, 20)
# Day Id Var Time
# 1 1 20180501-033-000001 3.156667 Morning_1
# 2 2 20180502-033-000001 2.986667 Morning_1
# 3 3 20180503-033-000001 3.073333 Morning_1
# 4 4 20180507-033-000001 3.236667 Morning_1
# 5 5 20180508-033-000001 3.030000 Morning_1
# 6 6 20180509-033-000001 3.010000 Morning_1
# 7 7 20180510-033-000001 2.916667 Morning_1
# 8 8 20180511-033-000001 3.293333 Morning_1
# 9 9 20180514-033-000001 3.136667 Morning_1
# 10 10 20180516-033-000001 3.116667 Morning_1
# 11 11 20180517-033-000003 2.843333 Morning_1
# 12 12 20180518-033-000001 3.033333 Morning_1
# 13 13 20180521-033-000002 3.173333 Morning_1
# 14 14 20180523-033-000001 3.336667 Morning_1
# 15 1 20180501-033-000002 2.866667 Morning_2
# 16 2 20180502-033-000002 2.930000 Morning_2
# 17 3 20180503-033-000002 3.070000 Morning_2
# 18 4 20180507-033-000002 2.990000 Morning_2
# 19 5 20180508-033-000002 3.150000 Morning_2
# 20 6 20180509-033-000002 3.020000 Morning_2
Here's another tidyverse
method. 这是另一种tidyverse
方法。 This is complicated by the fact that the different Var
columns correspond to a particular time, but the indication of the time is different from the way it is represented in the id
columns. 不同的Var
列对应于特定时间这一事实使情况变得复杂,但是时间的指示与id
列中表示时间的方式不同。 So you need to have some way of matching the two. 因此,您需要某种匹配两者的方式。 Here I do that with a named list inside var_renamer
. 在这里,我使用var_renamer
内的命名列表来var_renamer
。 Once the columns are consistently named, it becomes possible to use gather
and separate
to generate the right variables to be spread
back out into the desired format. 一旦一致地命名了列,就可以使用gather
和separate
来生成正确的变量,以将它们重新spread
为所需的格式。 Note that I mutate
Time
into an ordered factor so it can be sorted by time rather than alphabetically with arrange
. 请注意,我mutate
Time
mutate
为有序因子,因此可以按时间对它进行排序,而不必按字母顺序进行arrange
。
df <- structure(list(Day = 1:14, Morning_1_id = structure(1:14, .Label = c("20180501-033-000001", "20180502-033-000001", "20180503-033-000001", "20180507-033-000001", "20180508-033-000001", "20180509-033-000001", "20180510-033-000001", "20180511-033-000001", "20180514-033-000001", "20180516-033-000001", "20180517-033-000003", "20180518-033-000001", "20180521-033-000002", "20180523-033-000001"), class = "factor"), Var1 = c(3.156666667, 2.986666667, 3.073333333, 3.236666667, 3.03, 3.01, 2.916666667, 3.293333333, 3.136666667, 3.116666667, 2.843333333, 3.033333333, 3.173333333, 3.336666667), Morning_2_id = structure(1:14, .Label = c("20180501-033-000002", "20180502-033-000002", "20180503-033-000002", "20180507-033-000002", "20180508-033-000002", "20180509-033-000002", "20180510-033-000002", "20180511-033-000002", "20180514-033-000002", "20180516-033-000002", "20180517-033-000004", "20180518-033-000002", "20180521-033-000003", "20180523-033-000002"), class = "factor"), Var2 = c(2.866666667, 2.93, 3.07, 2.99, 3.15, 3.02, 3.103333333, 3.163333333, 3.186666667, 3.283333333, 3.12, 3.29, 2.993333333, 3.026666667), Afternoon_1_id = structure(1:14, .Label = c("20180501-033-000008", "20180502-033-000020", "20180503-033-000011", "20180507-033-000015", "20180508-033-000015", "20180509-033-000007", "20180510-033-000007", "20180511-033-000013", "20180514-033-000007", "20180516-033-000008", "20180517-033-000008", "20180518-033-000007", "20180521-033-000008", "20180523-033-000007"), class = "factor"), Var3 = c(2.946666667, 3.076666667, 3.106666667, 3.043333333, 3.156666667, 3, 3.336666667, 2.98, 2.766666667, 3.133333333, 3.06, 3.006666667, 2.983333333, 3.3), Afternoon_2_id = structure(1:14, .Label = c("20180501-033-000009", "20180502-033-000021", "20180503-033-000012", "20180507-033-000016", "20180508-033-000017", "20180509-033-000008", "20180510-033-000008", "20180511-033-000014", "20180514-033-000008", "20180516-033-000009", "20180517-033-000009", "20180518-033-000008", "20180521-033-000009", "20180523-033-000008"), class = "factor"), Var4 = c(3.133333333, 3.013333333, 2.9, 3.116666667, 3.343333333, 3.156666667, 3.066666667, 2.94, 3.1, 3.04, 3.033333333, 2.973333333, 3.02, 3.21)), class = "data.frame", row.names = c(NA, -14L))
library(tidyverse)
var_renamer <- function(name) {
time_list <- list(
"1" = "Morning_1", "2" = "Morning_2", "3" = "Afternoon_1", "4" = "Afternoon_2"
)
timenum = str_remove(name, "Var")
timestr = map_chr(timenum, ~ time_list[[.x]])
str_c(timestr, "-Var")
}
df %>%
rename_at(vars(starts_with("Var")), var_renamer) %>%
rename_all(funs(str_replace(., "_id", "-Id"))) %>%
gather(colname, val, -Day) %>%
separate(colname, c("Time", "id_var"), sep = "-") %>%
mutate(Time = factor(
x = Time,
levels = c("Morning_1", "Morning_2", "Afternoon_1", "Afternoon_2"),
ordered = TRUE
)) %>%
spread(id_var, val) %>%
arrange(Time, Day)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#> Day Time Id Var
#> 1 1 Morning_1 20180501-033-000001 3.156666667
#> 2 2 Morning_1 20180502-033-000001 2.986666667
#> 3 3 Morning_1 20180503-033-000001 3.073333333
#> 4 4 Morning_1 20180507-033-000001 3.236666667
#> 5 5 Morning_1 20180508-033-000001 3.03
#> 6 6 Morning_1 20180509-033-000001 3.01
#> 7 7 Morning_1 20180510-033-000001 2.916666667
#> 8 8 Morning_1 20180511-033-000001 3.293333333
#> 9 9 Morning_1 20180514-033-000001 3.136666667
#> 10 10 Morning_1 20180516-033-000001 3.116666667
#> 11 11 Morning_1 20180517-033-000003 2.843333333
#> 12 12 Morning_1 20180518-033-000001 3.033333333
#> 13 13 Morning_1 20180521-033-000002 3.173333333
#> 14 14 Morning_1 20180523-033-000001 3.336666667
#> 15 1 Morning_2 20180501-033-000002 2.866666667
#> 16 2 Morning_2 20180502-033-000002 2.93
#> 17 3 Morning_2 20180503-033-000002 3.07
#> 18 4 Morning_2 20180507-033-000002 2.99
#> 19 5 Morning_2 20180508-033-000002 3.15
#> 20 6 Morning_2 20180509-033-000002 3.02
#> 21 7 Morning_2 20180510-033-000002 3.103333333
#> 22 8 Morning_2 20180511-033-000002 3.163333333
#> 23 9 Morning_2 20180514-033-000002 3.186666667
#> 24 10 Morning_2 20180516-033-000002 3.283333333
#> 25 11 Morning_2 20180517-033-000004 3.12
#> 26 12 Morning_2 20180518-033-000002 3.29
#> 27 13 Morning_2 20180521-033-000003 2.993333333
#> 28 14 Morning_2 20180523-033-000002 3.026666667
#> 29 1 Afternoon_1 20180501-033-000008 2.946666667
#> 30 2 Afternoon_1 20180502-033-000020 3.076666667
#> 31 3 Afternoon_1 20180503-033-000011 3.106666667
#> 32 4 Afternoon_1 20180507-033-000015 3.043333333
#> 33 5 Afternoon_1 20180508-033-000015 3.156666667
#> 34 6 Afternoon_1 20180509-033-000007 3
#> 35 7 Afternoon_1 20180510-033-000007 3.336666667
#> 36 8 Afternoon_1 20180511-033-000013 2.98
#> 37 9 Afternoon_1 20180514-033-000007 2.766666667
#> 38 10 Afternoon_1 20180516-033-000008 3.133333333
#> 39 11 Afternoon_1 20180517-033-000008 3.06
#> 40 12 Afternoon_1 20180518-033-000007 3.006666667
#> 41 13 Afternoon_1 20180521-033-000008 2.983333333
#> 42 14 Afternoon_1 20180523-033-000007 3.3
#> 43 1 Afternoon_2 20180501-033-000009 3.133333333
#> 44 2 Afternoon_2 20180502-033-000021 3.013333333
#> 45 3 Afternoon_2 20180503-033-000012 2.9
#> 46 4 Afternoon_2 20180507-033-000016 3.116666667
#> 47 5 Afternoon_2 20180508-033-000017 3.343333333
#> 48 6 Afternoon_2 20180509-033-000008 3.156666667
#> 49 7 Afternoon_2 20180510-033-000008 3.066666667
#> 50 8 Afternoon_2 20180511-033-000014 2.94
#> 51 9 Afternoon_2 20180514-033-000008 3.1
#> 52 10 Afternoon_2 20180516-033-000009 3.04
#> 53 11 Afternoon_2 20180517-033-000009 3.033333333
#> 54 12 Afternoon_2 20180518-033-000008 2.973333333
#> 55 13 Afternoon_2 20180521-033-000009 3.02
#> 56 14 Afternoon_2 20180523-033-000008 3.21
Created on 2018-08-07 by the reprex package (v0.2.0). 由reprex软件包 (v0.2.0)于2018-08-07创建。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.