[英]Reshaping data matrix in R
I have some data to reshape in R but can not figure out how. 我有一些数据要在R中重塑,但无法弄清楚如何。 Here is the scenario: I have test scores data from a number of students from different schools. 这是一个场景:我有来自不同学校的一些学生的考试成绩数据。 Here is some example data: 以下是一些示例数据:
#Create example data:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
Resulting in a data format like this: 导致这样的数据格式:
score schoolid
1 1
10 1
20 2
40 2
20 3
So, there is aschool id which identifies the school and there is a test score for each student. 因此,有学校ID识别学校,每个学生都有一个考试分数。 For an analysis in a different program, I would like to have the data in a format like this: 对于不同程序中的分析,我希望以这样的格式获取数据:
Score student 1 Score student 2
School ID == 1 1 10
School ID == 2 10 40
School ID == 3 20 NA
To reshape the data, I tried to use the reshape and the cast function from the reshape2 library, but this resulted in errors: 为了重塑数据,我尝试使用reshape2库中的reshape和cast函数,但这导致了错误:
#Reshape function
reshape(test, v.names = test2$score, idvar = test2$schoolid, direction = "wide")
reshape(test, idvar = test$schoolid, direction = "wide")
#Error: in [.data.frame'(data,,idvar): undefined columns selected
#Cast function
cast(test,test$schoolid~test$score)
#Error: Error: could not find function "cast" (although ?cast works fine)
I guess that the fact that there number of test scores is different for each school complicates the restructuring process. 我想每个学校的考试成绩数量不同的事实使重组过程变得复杂。
How I can reshape this data and which function should I use ? 我如何重塑这些数据以及我应该使用哪种功能?
Here are some solutions that only use the base of R. All three solutions use this new studentno
variable: 以下是一些仅使用R的基础的解决方案。所有三个解决方案都使用这个新的studentno
变量:
studentno <- with(test, ave(schoolid, schoolid, FUN = seq_along))
1) tapply 1)tapply
with(test, tapply(score, list(schoolid, studentno), c))
giving: 赠送:
1 2
1 1 10
2 20 40
3 20 NA
2) reshape 2)重塑
# rename score to student and append studentno column
test2 <- transform(test, student = score, score = NULL, studentno = studentno)
reshape(test2, dir = "wide", idvar = "schoolid", timevar = "studentno")
giving: 赠送:
schoolid student.1 student.2
1 1 1 10
3 2 20 40
5 3 20 NA
3) xtabs xtabs
would also work if there are no students with a score of 0. 3)如果没有得分为0的学生, xtabs xtabs
也会起作用。
xt <- xtabs(score ~ schoolid + studentno, test)
xt[xt == 0] <- NA # omit this step if its ok to use 0 in place of NA
xt
giving: 赠送:
studentno
schoolid 1 2
1 1 10
2 20 40
3 20
You have to define the student id somewhere, for example: 您必须在某处定义学生ID,例如:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
test$studentid <- c(1,2,1,2,1)
library(reshape2)
dcast(test, schoolid ~ studentid, value.var="score",mean)
schoolid 1 2
1 1 1 10
2 2 20 40
3 3 20 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.