[英]How to reshape a data frame from wide to long format in R?
I am new to R. I am trying to read data from Excel in the mentioned format 我是R的新手。我正尝试以上述格式从Excel读取数据
x1 x2 x3 y1 y2 y3 Result
1 2 3 7 8 9
4 5 6 10 11 12
and data.frame in R should take data in mentioned format for 1st row R中的data.frame应该在第一行中采用上述格式的数据
x y
1 7
2 8
3 9
then I want to use lm()
and export the result to result column. 那么我想使用
lm()
并将结果导出到结果列。
I want to automate this for n
rows ie once results of 1st column is exported to Excel then I want to import data for second row. 我想对
n
行自动执行此操作,即一旦将第一列的结果导出到Excel,那么我想为第二行导入数据。
Please Help. 请帮忙。
library(gdata)
# this spreadsheet is exactly as in your question
df.original <- read.xls("test.xlsx", sheet="Sheet1", perl="C:/strawberry/perl/bin/perl.exe")
#
#
> df.original
x1 x2 x3 y1 y2 y3
1 1 2 3 7 8 9
2 4 5 6 10 11 12
#
# for the above code you'll just need to change the argument 'perl' with the
# path of your installer
#
# now the example for the first row
#
library(reshape2)
df <- melt(df.original[1,])
df$variable <- substr(df$variable, 1, 1)
df <- as.data.frame(lapply(split(df, df$variable), `[[`, 2))
> df
x y
1 1 7
2 2 8
3 3 9
Now, at this stage we automated the process of inport/transformation (for one line). 现在,在此阶段,我们自动化了导入/转换过程(一行)。
First question: How you want the data to look like when every line will be treated? 第一个问题:当每一行都将被处理时,您希望数据看起来如何? Second question: In result, what do you want exactly to put?
第二个问题:结果,您到底要表达什么? residual, fitted values?
剩余的拟合值? what you need from
lm()
? 您从
lm()
需要什么?
EDIT: 编辑:
ok, @kapil tell me if the final shape of df
is what you thought: 好的,@ kapil告诉我
df
的最终形状是否就是您的想法:
library(reshape2)
library(plyr)
df <- adply(df.original, 1, melt, .expand=F)
names(df)[1] <- "rowID"
df$variable <- substr(df$variable, 1, 1)
rows <- df$rowID[ df$variable=="x"] # with y would be the same (they are expected to have the same legnth)
df <- as.data.frame(lapply(split(df, df$variable), `[[`, c("value")))
df$rowID <- rows
df <- df[c("rowID", "x", "y")]
> df
rowID x y
1 1 1 7
2 1 2 8
3 1 3 9
4 2 4 10
5 2 5 11
6 2 6 12
regarding the coefficient you can calculate for each rowID
(which refers to the actual row in the xls
file) in this way: 关于您可以通过以下方式为每个
rowID
(指xls
文件中的实际行)计算的系数:
model <- dlply(df, .(rowID), function(z) {print(z); lm(y ~ x, df);})
> sapply(model, `[`, "coefficients")
$`1.coefficients`
(Intercept) x
6 1
$`2.coefficients`
(Intercept) x
6 1
so, for each group (or row in original spreadsheet) you have (as expected) two coefficients, intercept and slope, therefore I can't figure out how you want the coefficient to fit inside the data.frame
(especially in the 'long' way it appears just above). 因此,对于每个组(或原始电子表格中的行),您都有(按预期)两个系数,即截距和斜率,因此,我无法弄清楚您希望该系数如何适合
data.frame
(尤其是在的方式显示在上方)。 But if you wanted the data.frame
to stay in 'wide' mode then you can try this: 但是,如果您希望
data.frame
保持“宽”模式,则可以尝试以下操作:
# obtained the object model, you can put the coeff in the df.original data.frame
#
> ldply(model, `[[`, "coefficients")
rowID (Intercept) x
1 1 6 1
2 2 6 1
df.modified <- cbind(df.original, ldply(model, `[[`, "coefficients"))
> df.modified
x1 x2 x3 y1 y2 y3 rowID (Intercept) x
1 1 2 3 7 8 9 1 6 1
2 4 5 6 10 11 12 2 6 1
# of course, if you don't like it, you can remove rowID with df.modified$rowID <- NULL
Hope this helps, and let me know if you wanted the 'long' version of df. 希望这会有所帮助,并让我知道您是否需要df的“长版本”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.