简体   繁体   English

数据格式转换与R中的字符串拆分结合使用

[英]Data format conversion to be combined with string split in R

I have following data frame oridf : 我有以下数据框oridf

test_name   gp1_0month  gp2_0month  gp1_1month  gp2_1month  gp1_3month  gp2_3month
Test_1  136 137 152 143 156 150
Test_2  130 129 81  78  86  80
Test_3  129 128 68  68  74  71
Test_4  40  40  45  43  47  46
Test_5  203 201 141 134 149 142
Test_6  170 166 134 116 139 125

oridf <- structure(list(test_name = structure(1:6, .Label = c("Test_1", 
"Test_2", "Test_3", "Test_4", "Test_5", "Test_6"), class = "factor"), 
    gp1_0month = c(136L, 130L, 129L, 40L, 203L, 170L), gp2_0month = c(137L, 
    129L, 128L, 40L, 201L, 166L), gp1_1month = c(152L, 81L, 68L, 
    45L, 141L, 134L), gp2_1month = c(143L, 78L, 68L, 43L, 134L, 
    116L), gp1_3month = c(156L, 86L, 74L, 47L, 149L, 139L), gp2_3month = c(150L, 
    80L, 71L, 46L, 142L, 125L)), .Names = c("test_name", "gp1_0month", 
"gp2_0month", "gp1_1month", "gp2_1month", "gp1_3month", "gp2_3month"
), class = "data.frame", row.names = c(NA, -6L))

I need to convert it to following format: 我需要将其转换为以下格式:

test_name   month   group   value
Test_1      0       gp1     136
Test_1      0       gp2     137
Test_1      1       gp1     152
Test_1      1       gp2     143
.....

Hence, conversion would involve splitting of gp1 and 0month , etc. from columns 2:7 of the original data frame oridf so that I can plot it with following command: 因此,转换将涉及的分割gp10month从2列,等:原始数据帧的7 oridf这样我可以用下面的命令绘制它:

qplot(data=newdf, x=month, y=value, geom=c("point","line"), color=test_name, linetype=group)

How can I convert these data? 如何转换这些数据? I tried the melt command, but I cannot combine it with the strsplit command. 我尝试了melt命令,但是无法将其与strsplit命令结合使用。

First I would use melt like you had done. 首先,我将像您一样使用融化。

library(reshape2)
mm <- melt(oridf)

then there is also a colsplit function you can use in the reshape2 library as well. 然后在reshape2库中也可以使用colsplit函数。 Here we use it on the variable column to split at the underscore and the "m" in month (ignoring the rest) 在这里,我们在变量列上使用它来分隔下划线和月份中的“ m”(忽略其余部分)

info <- colsplit(mm$variable, "(_|m)", c("group","month", "xx"))[,-3]

Then we can recombine the data 然后我们可以重组数据

newdf <- cbind(mm[,1, drop=F], info, mm[,3, drop=F])

# head(newdf)
#   test_name group month value
# 1    Test_1   gp1     0   136
# 2    Test_2   gp1     0   130
# 3    Test_3   gp1     0   129
# 4    Test_4   gp1     0    40
# 5    Test_5   gp1     0   203
# 6    Test_6   gp1     0   170

And we can plot it using the qplot command you supplied above 我们可以使用上面提供的qplot命令来绘制它

在此处输入图片说明

Use gather from the tidyr package to convert from wide to long and then use separate from the same package to separate the group_month column into group and month columns. 使用gather从tidyr包从广角转换为长,然后用separate从同一个包到分离group_month柱成groupmonth列。 Finally using mutate from dplyr smf extract_numeric from tidyr extract the numeric part of month . 最后,使用来自dplyr的mutate从tidyr提取smf extract_numeric提取month的数字部分。

library(dplyr)
# devtools::install_github("hadley/tidyr")
library(tidyr)

newdf <- oridf %>%
   gather(group_month, value, -test_name) %>% 
   separate(group_month, into = c("group", "month")) %>% 
   mutate(month = extract_numeric(month))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM