简体   繁体   English

R融化重塑数据

[英]R Melt reshape data

Here' my data: 这是我的数据:

Day        Morning_1_id     Var1        Morning_2_id     Var2      Afternoon_1_id     Var3      Afternoon_2_id     Var4
1     20180501-033-000001 3.156667 20180501-033-000002 2.866667 20180501-033-000008 2.946667 20180501-033-000009 3.133333
2     20180502-033-000001 2.986667 20180502-033-000002 2.930000 20180502-033-000020 3.076667 20180502-033-000021 3.013333
3     20180503-033-000001 3.073333 20180503-033-000002 3.070000 20180503-033-000011 3.106667 20180503-033-000012 2.900000
4     20180507-033-000001 3.236667 20180507-033-000002 2.990000 20180507-033-000015 3.043333 20180507-033-000016 3.116667
5     20180508-033-000001 3.030000 20180508-033-000002 3.150000 20180508-033-000015 3.156667 20180508-033-000017 3.343333
6     20180509-033-000001 3.010000 20180509-033-000002 3.020000 20180509-033-000007 3.000000 20180509-033-000008 3.156667
7     20180510-033-000001 2.916667 20180510-033-000002 3.103333 20180510-033-000007 3.336667 20180510-033-000008 3.066667
8     20180511-033-000001 3.293333 20180511-033-000002 3.163333 20180511-033-000013 2.980000 20180511-033-000014 2.940000
9     20180514-033-000001 3.136667 20180514-033-000002 3.186667 20180514-033-000007 2.766667 20180514-033-000008 3.100000
10    20180516-033-000001 3.116667 20180516-033-000002 3.283333 20180516-033-000008 3.133333 20180516-033-000009 3.040000
11    20180517-033-000003 2.843333 20180517-033-000004 3.120000 20180517-033-000008 3.060000 20180517-033-000009 3.033333
12    20180518-033-000001 3.033333 20180518-033-000002 3.290000 20180518-033-000007 3.006667 20180518-033-000008 2.973333
13    20180521-033-000002 3.173333 20180521-033-000003 2.993333 20180521-033-000008 2.983333 20180521-033-000009 3.020000
14    20180523-033-000001 3.336667 20180523-033-000002 3.026667 20180523-033-000007 3.300000 20180523-033-000008 3.210000

Reproducible form: 可复制形式:

structure(list(Day = 1:14, Morning_1_id = structure(1:14, .Label = c("20180501-033-000001", 
"20180502-033-000001", "20180503-033-000001", "20180507-033-000001", 
"20180508-033-000001", "20180509-033-000001", "20180510-033-000001", 
"20180511-033-000001", "20180514-033-000001", "20180516-033-000001", 
"20180517-033-000003", "20180518-033-000001", "20180521-033-000002", 
"20180523-033-000001"), class = "factor"), Var1 = c(3.156666667, 
2.986666667, 3.073333333, 3.236666667, 3.03, 3.01, 2.916666667, 
3.293333333, 3.136666667, 3.116666667, 2.843333333, 3.033333333, 
3.173333333, 3.336666667), Morning_2_id = structure(1:14, .Label = c("20180501-033-000002", 
"20180502-033-000002", "20180503-033-000002", "20180507-033-000002", 
"20180508-033-000002", "20180509-033-000002", "20180510-033-000002", 
"20180511-033-000002", "20180514-033-000002", "20180516-033-000002", 
"20180517-033-000004", "20180518-033-000002", "20180521-033-000003", 
"20180523-033-000002"), class = "factor"), Var2 = c(2.866666667, 
2.93, 3.07, 2.99, 3.15, 3.02, 3.103333333, 3.163333333, 3.186666667, 
3.283333333, 3.12, 3.29, 2.993333333, 3.026666667), Afternoon_1_id = structure(1:14, .Label = c("20180501-033-000008", 
"20180502-033-000020", "20180503-033-000011", "20180507-033-000015", 
"20180508-033-000015", "20180509-033-000007", "20180510-033-000007", 
"20180511-033-000013", "20180514-033-000007", "20180516-033-000008", 
"20180517-033-000008", "20180518-033-000007", "20180521-033-000008", 
"20180523-033-000007"), class = "factor"), Var3 = c(2.946666667, 
3.076666667, 3.106666667, 3.043333333, 3.156666667, 3, 3.336666667, 
2.98, 2.766666667, 3.133333333, 3.06, 3.006666667, 2.983333333, 
3.3), Afternoon_2_id = structure(1:14, .Label = c("20180501-033-000009", 
"20180502-033-000021", "20180503-033-000012", "20180507-033-000016", 
"20180508-033-000017", "20180509-033-000008", "20180510-033-000008", 
"20180511-033-000014", "20180514-033-000008", "20180516-033-000009", 
"20180517-033-000009", "20180518-033-000008", "20180521-033-000009", 
"20180523-033-000008"), class = "factor"), Var4 = c(3.133333333, 
3.013333333, 2.9, 3.116666667, 3.343333333, 3.156666667, 3.066666667, 
2.94, 3.1, 3.04, 3.033333333, 2.973333333, 3.02, 3.21)), class = "data.frame", row.names = c(NA, 
-14L))

Here's what I want it to be: 这是我想要的:

Day Id                  Var         Time
1   20180501-033-000001 3.156666667 Morning1
2   20180502-033-000001 2.986666667 Morning1
3   20180503-033-000001 3.073333333 Morning1
4   20180507-033-000001 3.236666667 Morning1
5   20180508-033-000001 3.03        Morning1
6   20180509-033-000001 3.01        Morning1
7   20180510-033-000001 2.916666667 Morning1
8   20180511-033-000001 3.293333333 Morning1
9   20180514-033-000001 3.136666667 Morning1
10  20180516-033-000001 3.116666667 Morning1
11  20180517-033-000003 2.843333333 Morning1
12  20180518-033-000001 3.033333333 Morning1
13  20180521-033-000002 3.173333333 Morning1
14  20180523-033-000001 3.336666667 Morning1
1   20180501-033-000002 2.866666667 Morning2
2   20180502-033-000002 2.93        Morning2
3   20180503-033-000002 3.07        Morning2
4   20180507-033-000002 2.99        Morning2
5   20180508-033-000002 3.15        Morning2
6   20180509-033-000002 3.02        Morning2
7   20180510-033-000002 3.103333333 Morning2
8   20180511-033-000002 3.163333333 Morning2
9   20180514-033-000002 3.186666667 Morning2
10  20180516-033-000002 3.283333333 Morning2
11  20180517-033-000004 3.12        Morning2
12  20180518-033-000002 3.29        Morning2
13  20180521-033-000003 2.993333333 Morning2
14  20180523-033-000002 3.026666667 Morning2
1   20180501-033-000008 2.946666667 Afternoon1
2   20180502-033-000020 3.076666667 Afternoon1
3   20180503-033-000011 3.106666667 Afternoon1
4   20180507-033-000015 3.043333333 Afternoon1
5   20180508-033-000015 3.156666667 Afternoon1
6   20180509-033-000007 3           Afternoon1
7   20180510-033-000007 3.336666667 Afternoon1
8   20180511-033-000013 2.98        Afternoon1
9   20180514-033-000007 2.766666667 Afternoon1
10  20180516-033-000008 3.133333333 Afternoon1
11  20180517-033-000008 3.06        Afternoon1
12  20180518-033-000007 3.006666667 Afternoon1
13  20180521-033-000008 2.983333333 Afternoon1
14  20180523-033-000007 3.3         Afternoon1
1   20180501-033-000009 3.133333333 Afternoon2
2   20180502-033-000021 3.013333333 Afternoon2
3   20180503-033-000012 2.9         Afternoon2
4   20180507-033-000016 3.116666667 Afternoon2
5   20180508-033-000017 3.343333333 Afternoon2
6   20180509-033-000008 3.156666667 Afternoon2
7   20180510-033-000008 3.066666667 Afternoon2
8   20180511-033-000014 2.94        Afternoon2
9   20180514-033-000008 3.1         Afternoon2
10  20180516-033-000009 3.04        Afternoon2
11  20180517-033-000009 3.033333333 Afternoon2
12  20180518-033-000008 2.973333333 Afternoon2
13  20180521-033-000009 3.02        Afternoon2
14  20180523-033-000008 3.21        Afternoon2

I want to do wide-to-long conversion such that the Ids and the values of 'Var' get stacked day wise. 我想进行从宽到长的转换,以使Ids和'Var'的值逐日堆积。 I also want an additional column named 'Time', which will depend upon the initial ids, namely 'Morning_1_id', 'Morning_2_id', 'Afternoon_1_id' and 'Afternoon_2_id'. 我还想要一个名为“时间”的附加列,该列将取决于初始ID,即“ Morning_1_id”,“ Morning_2_id”,“ Afternoon_1_id”和“ Afternoon_2_id”。 How to do this? 这个怎么做? I tried using melt from reshape2 but couldn't get it done. 我尝试使用来自reshape2的融合,但无法完成。

Here is a solution using dplyr to transform your table into the requested format: 这是使用dplyr将表转换为所需格式的解决方案:

library(dplyr)

mydata<- reshape(mydata, direction='long', 
                varying=c('Morning_1_id', 'Var1', 'Morning_2_id', 'Var2', 'Afternoon_1_id', 'Var3', 'Afternoon_2_id', 'Var4'), 
                timevar='Var',
                times=c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'),
                v.names=c('Id', 'Var'),
                idvar='Day')

mydata<- tibble::rownames_to_column(mydata)
mydata$rowname<- gsub("^.*\\.","", mydata$rowname)
names(mydata)<- c("Time", "Day", "Var", "Id")
mydata<- mydata[,c(2,4,3,1)]

Here is a tidyverse option 这是一个tidyverse选项

corrected per comments from @Calum You 已根据@Calum You的评论更正

df %>%
  gather(Time, Var, -Day, -c(Var1, Var2, Var3, Var4)) %>%
  mutate(Time = gsub('.{3}$', '',Time),
         start = substr(Time, 1, 1),
         end = substr(Time, nchar(Time), nchar(Time)),
         id = paste0(start,end),
         Val = case_when(id=='M1' ~ Var1,
                         id=='M2' ~ Var2,
                         id=='A1' ~ Var3,
                         id=='A2' ~ Var4)) %>% 
  dplyr::select(Day, Id=Var, Val, Time)

Original incorrect code 原始不正确的代码

df %>%
   gather(Time, Var, -Day, -c(Var1, Var2, Var3, Var4)) %>%
   gather( key, value, -Day, -Time, -Var) %>% 
   mutate(Time = gsub('.{3}$', '',Time)) %>% 
   dplyr::select(Day, Id=Var, Var=value, Time)

Consider base R by building a list of every 2nd column of a sequence and then row binding all df elements: 通过建立序列的第二列的列表,然后行绑定所有df元素,来考虑基数R:

df_list <- lapply(seq(3, length(df), 2), function(i) {
  sub <- df[c(1, (i-1):i)]                                      # SUBSET BY COLS
  sub <- transform(sub, Time = sub("_id", "", names(df)[i-1]))  # ADD TIME VAR
  setNames(sub, c("Day", "Id", "Var", "Time"))                  # RENAME COLS  
})

long_df <- do.call(rbind, df_list)

head(long_df, 20)    
#    Day                  Id      Var      Time
# 1    1 20180501-033-000001 3.156667 Morning_1
# 2    2 20180502-033-000001 2.986667 Morning_1
# 3    3 20180503-033-000001 3.073333 Morning_1
# 4    4 20180507-033-000001 3.236667 Morning_1
# 5    5 20180508-033-000001 3.030000 Morning_1
# 6    6 20180509-033-000001 3.010000 Morning_1
# 7    7 20180510-033-000001 2.916667 Morning_1
# 8    8 20180511-033-000001 3.293333 Morning_1
# 9    9 20180514-033-000001 3.136667 Morning_1
# 10  10 20180516-033-000001 3.116667 Morning_1
# 11  11 20180517-033-000003 2.843333 Morning_1
# 12  12 20180518-033-000001 3.033333 Morning_1
# 13  13 20180521-033-000002 3.173333 Morning_1
# 14  14 20180523-033-000001 3.336667 Morning_1
# 15   1 20180501-033-000002 2.866667 Morning_2
# 16   2 20180502-033-000002 2.930000 Morning_2
# 17   3 20180503-033-000002 3.070000 Morning_2
# 18   4 20180507-033-000002 2.990000 Morning_2
# 19   5 20180508-033-000002 3.150000 Morning_2
# 20   6 20180509-033-000002 3.020000 Morning_2

Here's another tidyverse method. 这是另一种tidyverse方法。 This is complicated by the fact that the different Var columns correspond to a particular time, but the indication of the time is different from the way it is represented in the id columns. 不同的Var列对应于特定时间这一事实使情况变得复杂,但是时间的指示与id列中表示时间的方式不同。 So you need to have some way of matching the two. 因此,您需要某种匹配两者的方式。 Here I do that with a named list inside var_renamer . 在这里,我使用var_renamer内的命名列表来var_renamer Once the columns are consistently named, it becomes possible to use gather and separate to generate the right variables to be spread back out into the desired format. 一旦一致地命名了列,就可以使用gatherseparate来生成正确的变量,以将它们重新spread为所需的格式。 Note that I mutate Time into an ordered factor so it can be sorted by time rather than alphabetically with arrange . 请注意,我mutate Time mutate为有序因子,因此可以按时间对它进行排序,而不必按字母顺序进行arrange

df <- structure(list(Day = 1:14, Morning_1_id = structure(1:14, .Label = c("20180501-033-000001", "20180502-033-000001", "20180503-033-000001", "20180507-033-000001", "20180508-033-000001", "20180509-033-000001", "20180510-033-000001", "20180511-033-000001", "20180514-033-000001", "20180516-033-000001", "20180517-033-000003", "20180518-033-000001", "20180521-033-000002", "20180523-033-000001"), class = "factor"), Var1 = c(3.156666667, 2.986666667, 3.073333333, 3.236666667, 3.03, 3.01, 2.916666667, 3.293333333, 3.136666667, 3.116666667, 2.843333333, 3.033333333, 3.173333333, 3.336666667), Morning_2_id = structure(1:14, .Label = c("20180501-033-000002", "20180502-033-000002", "20180503-033-000002", "20180507-033-000002", "20180508-033-000002", "20180509-033-000002", "20180510-033-000002", "20180511-033-000002", "20180514-033-000002", "20180516-033-000002", "20180517-033-000004", "20180518-033-000002", "20180521-033-000003", "20180523-033-000002"), class = "factor"), Var2 = c(2.866666667, 2.93, 3.07, 2.99, 3.15, 3.02, 3.103333333, 3.163333333, 3.186666667, 3.283333333, 3.12, 3.29, 2.993333333, 3.026666667), Afternoon_1_id = structure(1:14, .Label = c("20180501-033-000008", "20180502-033-000020", "20180503-033-000011", "20180507-033-000015", "20180508-033-000015", "20180509-033-000007", "20180510-033-000007", "20180511-033-000013", "20180514-033-000007", "20180516-033-000008", "20180517-033-000008", "20180518-033-000007", "20180521-033-000008", "20180523-033-000007"), class = "factor"), Var3 = c(2.946666667, 3.076666667, 3.106666667, 3.043333333, 3.156666667, 3, 3.336666667, 2.98, 2.766666667, 3.133333333, 3.06, 3.006666667, 2.983333333, 3.3), Afternoon_2_id = structure(1:14, .Label = c("20180501-033-000009", "20180502-033-000021", "20180503-033-000012", "20180507-033-000016", "20180508-033-000017", "20180509-033-000008", "20180510-033-000008", "20180511-033-000014", "20180514-033-000008", "20180516-033-000009", "20180517-033-000009", "20180518-033-000008", "20180521-033-000009", "20180523-033-000008"), class = "factor"), Var4 = c(3.133333333, 3.013333333, 2.9, 3.116666667, 3.343333333, 3.156666667, 3.066666667, 2.94, 3.1, 3.04, 3.033333333, 2.973333333, 3.02, 3.21)), class = "data.frame", row.names = c(NA, -14L))

library(tidyverse)
var_renamer <- function(name) {
  time_list <- list(
    "1" = "Morning_1", "2" = "Morning_2", "3" = "Afternoon_1", "4" = "Afternoon_2"
  )
  timenum = str_remove(name, "Var")
  timestr = map_chr(timenum, ~ time_list[[.x]])
  str_c(timestr, "-Var")
}

df %>%
  rename_at(vars(starts_with("Var")), var_renamer) %>%
  rename_all(funs(str_replace(., "_id", "-Id"))) %>%
  gather(colname, val, -Day) %>%
  separate(colname, c("Time", "id_var"), sep = "-") %>%
  mutate(Time = factor(
    x = Time,
    levels = c("Morning_1", "Morning_2", "Afternoon_1", "Afternoon_2"),
    ordered = TRUE
  )) %>%
  spread(id_var, val) %>%
  arrange(Time, Day)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#>    Day        Time                  Id         Var
#> 1    1   Morning_1 20180501-033-000001 3.156666667
#> 2    2   Morning_1 20180502-033-000001 2.986666667
#> 3    3   Morning_1 20180503-033-000001 3.073333333
#> 4    4   Morning_1 20180507-033-000001 3.236666667
#> 5    5   Morning_1 20180508-033-000001        3.03
#> 6    6   Morning_1 20180509-033-000001        3.01
#> 7    7   Morning_1 20180510-033-000001 2.916666667
#> 8    8   Morning_1 20180511-033-000001 3.293333333
#> 9    9   Morning_1 20180514-033-000001 3.136666667
#> 10  10   Morning_1 20180516-033-000001 3.116666667
#> 11  11   Morning_1 20180517-033-000003 2.843333333
#> 12  12   Morning_1 20180518-033-000001 3.033333333
#> 13  13   Morning_1 20180521-033-000002 3.173333333
#> 14  14   Morning_1 20180523-033-000001 3.336666667
#> 15   1   Morning_2 20180501-033-000002 2.866666667
#> 16   2   Morning_2 20180502-033-000002        2.93
#> 17   3   Morning_2 20180503-033-000002        3.07
#> 18   4   Morning_2 20180507-033-000002        2.99
#> 19   5   Morning_2 20180508-033-000002        3.15
#> 20   6   Morning_2 20180509-033-000002        3.02
#> 21   7   Morning_2 20180510-033-000002 3.103333333
#> 22   8   Morning_2 20180511-033-000002 3.163333333
#> 23   9   Morning_2 20180514-033-000002 3.186666667
#> 24  10   Morning_2 20180516-033-000002 3.283333333
#> 25  11   Morning_2 20180517-033-000004        3.12
#> 26  12   Morning_2 20180518-033-000002        3.29
#> 27  13   Morning_2 20180521-033-000003 2.993333333
#> 28  14   Morning_2 20180523-033-000002 3.026666667
#> 29   1 Afternoon_1 20180501-033-000008 2.946666667
#> 30   2 Afternoon_1 20180502-033-000020 3.076666667
#> 31   3 Afternoon_1 20180503-033-000011 3.106666667
#> 32   4 Afternoon_1 20180507-033-000015 3.043333333
#> 33   5 Afternoon_1 20180508-033-000015 3.156666667
#> 34   6 Afternoon_1 20180509-033-000007           3
#> 35   7 Afternoon_1 20180510-033-000007 3.336666667
#> 36   8 Afternoon_1 20180511-033-000013        2.98
#> 37   9 Afternoon_1 20180514-033-000007 2.766666667
#> 38  10 Afternoon_1 20180516-033-000008 3.133333333
#> 39  11 Afternoon_1 20180517-033-000008        3.06
#> 40  12 Afternoon_1 20180518-033-000007 3.006666667
#> 41  13 Afternoon_1 20180521-033-000008 2.983333333
#> 42  14 Afternoon_1 20180523-033-000007         3.3
#> 43   1 Afternoon_2 20180501-033-000009 3.133333333
#> 44   2 Afternoon_2 20180502-033-000021 3.013333333
#> 45   3 Afternoon_2 20180503-033-000012         2.9
#> 46   4 Afternoon_2 20180507-033-000016 3.116666667
#> 47   5 Afternoon_2 20180508-033-000017 3.343333333
#> 48   6 Afternoon_2 20180509-033-000008 3.156666667
#> 49   7 Afternoon_2 20180510-033-000008 3.066666667
#> 50   8 Afternoon_2 20180511-033-000014        2.94
#> 51   9 Afternoon_2 20180514-033-000008         3.1
#> 52  10 Afternoon_2 20180516-033-000009        3.04
#> 53  11 Afternoon_2 20180517-033-000009 3.033333333
#> 54  12 Afternoon_2 20180518-033-000008 2.973333333
#> 55  13 Afternoon_2 20180521-033-000009        3.02
#> 56  14 Afternoon_2 20180523-033-000008        3.21

Created on 2018-08-07 by the reprex package (v0.2.0). reprex软件包 (v0.2.0)于2018-08-07创建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM