[英]Reshaping data matrix in R
我有一些數據要在R中重塑,但無法弄清楚如何。 這是一個場景:我有來自不同學校的一些學生的考試成績數據。 以下是一些示例數據:
#Create example data:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
導致這樣的數據格式:
score schoolid
1 1
10 1
20 2
40 2
20 3
因此,有學校ID識別學校,每個學生都有一個考試分數。 對於不同程序中的分析,我希望以這樣的格式獲取數據:
Score student 1 Score student 2
School ID == 1 1 10
School ID == 2 10 40
School ID == 3 20 NA
為了重塑數據,我嘗試使用reshape2庫中的reshape和cast函數,但這導致了錯誤:
#Reshape function
reshape(test, v.names = test2$score, idvar = test2$schoolid, direction = "wide")
reshape(test, idvar = test$schoolid, direction = "wide")
#Error: in [.data.frame'(data,,idvar): undefined columns selected
#Cast function
cast(test,test$schoolid~test$score)
#Error: Error: could not find function "cast" (although ?cast works fine)
我想每個學校的考試成績數量不同的事實使重組過程變得復雜。
我如何重塑這些數據以及我應該使用哪種功能?
以下是一些僅使用R的基礎的解決方案。所有三個解決方案都使用這個新的studentno
變量:
studentno <- with(test, ave(schoolid, schoolid, FUN = seq_along))
1)tapply
with(test, tapply(score, list(schoolid, studentno), c))
贈送:
1 2
1 1 10
2 20 40
3 20 NA
2)重塑
# rename score to student and append studentno column
test2 <- transform(test, student = score, score = NULL, studentno = studentno)
reshape(test2, dir = "wide", idvar = "schoolid", timevar = "studentno")
贈送:
schoolid student.1 student.2
1 1 1 10
3 2 20 40
5 3 20 NA
3)如果沒有得分為0的學生, xtabs xtabs
也會起作用。
xt <- xtabs(score ~ schoolid + studentno, test)
xt[xt == 0] <- NA # omit this step if its ok to use 0 in place of NA
xt
贈送:
studentno
schoolid 1 2
1 1 10
2 20 40
3 20
您必須在某處定義學生ID,例如:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
test$studentid <- c(1,2,1,2,1)
library(reshape2)
dcast(test, schoolid ~ studentid, value.var="score",mean)
schoolid 1 2
1 1 1 10
2 2 20 40
3 3 20 NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.