[英]Data format conversion to be combined with string split in R
I have following data frame oridf
: 我有以下数据框oridf
:
test_name gp1_0month gp2_0month gp1_1month gp2_1month gp1_3month gp2_3month
Test_1 136 137 152 143 156 150
Test_2 130 129 81 78 86 80
Test_3 129 128 68 68 74 71
Test_4 40 40 45 43 47 46
Test_5 203 201 141 134 149 142
Test_6 170 166 134 116 139 125
oridf <- structure(list(test_name = structure(1:6, .Label = c("Test_1",
"Test_2", "Test_3", "Test_4", "Test_5", "Test_6"), class = "factor"),
gp1_0month = c(136L, 130L, 129L, 40L, 203L, 170L), gp2_0month = c(137L,
129L, 128L, 40L, 201L, 166L), gp1_1month = c(152L, 81L, 68L,
45L, 141L, 134L), gp2_1month = c(143L, 78L, 68L, 43L, 134L,
116L), gp1_3month = c(156L, 86L, 74L, 47L, 149L, 139L), gp2_3month = c(150L,
80L, 71L, 46L, 142L, 125L)), .Names = c("test_name", "gp1_0month",
"gp2_0month", "gp1_1month", "gp2_1month", "gp1_3month", "gp2_3month"
), class = "data.frame", row.names = c(NA, -6L))
I need to convert it to following format: 我需要将其转换为以下格式:
test_name month group value
Test_1 0 gp1 136
Test_1 0 gp2 137
Test_1 1 gp1 152
Test_1 1 gp2 143
.....
Hence, conversion would involve splitting of gp1
and 0month
, etc. from columns 2:7 of the original data frame oridf
so that I can plot it with following command: 因此,转换将涉及的分割gp1
和0month
从2列,等:原始数据帧的7 oridf
这样我可以用下面的命令绘制它:
qplot(data=newdf, x=month, y=value, geom=c("point","line"), color=test_name, linetype=group)
How can I convert these data? 如何转换这些数据? I tried the melt
command, but I cannot combine it with the strsplit
command. 我尝试了melt
命令,但是无法将其与strsplit
命令结合使用。
First I would use melt like you had done. 首先,我将像您一样使用融化。
library(reshape2)
mm <- melt(oridf)
then there is also a colsplit
function you can use in the reshape2
library as well. 然后在reshape2
库中也可以使用colsplit
函数。 Here we use it on the variable column to split at the underscore and the "m" in month (ignoring the rest) 在这里,我们在变量列上使用它来分隔下划线和月份中的“ m”(忽略其余部分)
info <- colsplit(mm$variable, "(_|m)", c("group","month", "xx"))[,-3]
Then we can recombine the data 然后我们可以重组数据
newdf <- cbind(mm[,1, drop=F], info, mm[,3, drop=F])
# head(newdf)
# test_name group month value
# 1 Test_1 gp1 0 136
# 2 Test_2 gp1 0 130
# 3 Test_3 gp1 0 129
# 4 Test_4 gp1 0 40
# 5 Test_5 gp1 0 203
# 6 Test_6 gp1 0 170
And we can plot it using the qplot
command you supplied above 我们可以使用上面提供的qplot
命令来绘制它
Use gather
from the tidyr package to convert from wide to long and then use separate
from the same package to separate the group_month
column into group
and month
columns. 使用gather
从tidyr包从广角转换为长,然后用separate
从同一个包到分离group_month
柱成group
和month
列。 Finally using mutate
from dplyr smf extract_numeric
from tidyr extract the numeric part of month
. 最后,使用来自dplyr的mutate
从tidyr提取smf extract_numeric
提取month
的数字部分。
library(dplyr)
# devtools::install_github("hadley/tidyr")
library(tidyr)
newdf <- oridf %>%
gather(group_month, value, -test_name) %>%
separate(group_month, into = c("group", "month")) %>%
mutate(month = extract_numeric(month))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.