简体   繁体   中英

Data format conversion to be combined with string split in R

I have following data frame oridf :

test_name   gp1_0month  gp2_0month  gp1_1month  gp2_1month  gp1_3month  gp2_3month
Test_1  136 137 152 143 156 150
Test_2  130 129 81  78  86  80
Test_3  129 128 68  68  74  71
Test_4  40  40  45  43  47  46
Test_5  203 201 141 134 149 142
Test_6  170 166 134 116 139 125

oridf <- structure(list(test_name = structure(1:6, .Label = c("Test_1", 
"Test_2", "Test_3", "Test_4", "Test_5", "Test_6"), class = "factor"), 
    gp1_0month = c(136L, 130L, 129L, 40L, 203L, 170L), gp2_0month = c(137L, 
    129L, 128L, 40L, 201L, 166L), gp1_1month = c(152L, 81L, 68L, 
    45L, 141L, 134L), gp2_1month = c(143L, 78L, 68L, 43L, 134L, 
    116L), gp1_3month = c(156L, 86L, 74L, 47L, 149L, 139L), gp2_3month = c(150L, 
    80L, 71L, 46L, 142L, 125L)), .Names = c("test_name", "gp1_0month", 
"gp2_0month", "gp1_1month", "gp2_1month", "gp1_3month", "gp2_3month"
), class = "data.frame", row.names = c(NA, -6L))

I need to convert it to following format:

test_name   month   group   value
Test_1      0       gp1     136
Test_1      0       gp2     137
Test_1      1       gp1     152
Test_1      1       gp2     143
.....

Hence, conversion would involve splitting of gp1 and 0month , etc. from columns 2:7 of the original data frame oridf so that I can plot it with following command:

qplot(data=newdf, x=month, y=value, geom=c("point","line"), color=test_name, linetype=group)

How can I convert these data? I tried the melt command, but I cannot combine it with the strsplit command.

First I would use melt like you had done.

library(reshape2)
mm <- melt(oridf)

then there is also a colsplit function you can use in the reshape2 library as well. Here we use it on the variable column to split at the underscore and the "m" in month (ignoring the rest)

info <- colsplit(mm$variable, "(_|m)", c("group","month", "xx"))[,-3]

Then we can recombine the data

newdf <- cbind(mm[,1, drop=F], info, mm[,3, drop=F])

# head(newdf)
#   test_name group month value
# 1    Test_1   gp1     0   136
# 2    Test_2   gp1     0   130
# 3    Test_3   gp1     0   129
# 4    Test_4   gp1     0    40
# 5    Test_5   gp1     0   203
# 6    Test_6   gp1     0   170

And we can plot it using the qplot command you supplied above

在此处输入图片说明

Use gather from the tidyr package to convert from wide to long and then use separate from the same package to separate the group_month column into group and month columns. Finally using mutate from dplyr smf extract_numeric from tidyr extract the numeric part of month .

library(dplyr)
# devtools::install_github("hadley/tidyr")
library(tidyr)

newdf <- oridf %>%
   gather(group_month, value, -test_name) %>% 
   separate(group_month, into = c("group", "month")) %>% 
   mutate(month = extract_numeric(month))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM