[英]Data Frame redesign : merge 2 row in one single row - and rename by value
我是R的新手。為了找到這個問題的理想答案,我做了很多研究和測試。 我試過重塑,t,融化等。我也在為變量的名稱苦苦掙扎。 我陷入這樣的數據框架。 我們有時間問問題(在問題1之前),然后在第二行有時間記錄答案。
Time Logs
446.6204 Question1
452.7516 4
452.7516 Question2
458.1999 3
458.1999 Question3
460.2342 5
我想將所有內容都放在一行上,並用“日志”中的值命名該變量。 運氣對我來說,模式是恆定的,所以切片很不錯。
Respondent TimeQ1 Question1 TimeA1 TimeQ2 Question2 TimeA2 TimeQ3 Question3 TimeA3
Respondent1 446.6204 4 452.7516 452.7516 3 458.1999 458.1999 5 460.2342
謝謝你的幫助!
我為受訪者添加了一個列,並為多個受訪者添加了數據。 這是示例數據集:
DF <- structure(list(Respondent = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Respondent 1",
"Respondent 2", "Respondent 3"), class = "factor"), Time = c(446.6204,
452.7516, 452.7516, 458.1999, 458.1999, 460.2342, 535.94448,
543.30192, 543.30192, 549.83988, 549.83988, 552.28104, 443.2204,
449.3516, 449.3516, 454.7999, 454.7999, 456.8342), Logs = structure(c(6L,
4L, 7L, 3L, 8L, 5L, 6L, 5L, 7L, 2L, 8L, 3L, 6L, 1L, 7L, 4L, 8L,
5L), .Label = c("1", "2", "3", "4", "5", "Question1", "Question2",
"Question3"), class = "factor")), .Names = c("Respondent", "Time",
"Logs"), row.names = c(NA, -18L), class = "data.frame")
我不認為將所有數據都放在一條線上是您的最佳選擇。 如果您有很多問題,那么您的電話線會很長。
這是我之前建議的格式(我仍然認為更好):
newDF <- data.frame(respondent = DF$Respondent[grep("Question", DF$Logs)],
question = as.character(DF$Logs[grep("Question", DF$Logs)]),
questionTime = DF$Time[grep("Question", DF$Logs)],
responseValue = DF$Logs[-grep("Question", DF$Logs)],
responseTime = DF$Time[-grep("Question", DF$Logs)])
newDF
# respondent question questionTime responseValue responseTime
# Respondent 1 Question1 446.6204 4 452.7516
# Respondent 1 Question2 452.7516 3 458.1999
# Respondent 1 Question3 458.1999 5 460.2342
# Respondent 2 Question1 535.9445 5 543.3019
# Respondent 2 Question2 543.3019 2 549.8399
# Respondent 2 Question3 549.8399 3 552.2810
# Respondent 3 Question1 443.2204 1 449.3516
# Respondent 3 Question2 449.3516 4 454.7999
# Respondent 3 Question3 454.7999 5 456.8342
基於被訪者還有一列的事實,您可以使用dcast
東西將上面我的表格中的數據放入您要查找的內容中。 步驟如下:
qTime <- dcast(newDF, respondent ~ question, value.var = "questionTime")
names(qTime)[2:length(names(qTime))] <- paste0("TimeQ", seq(1,length(names(qTime))-1,1) )
rValue <- dcast(newDF, respondent ~ question, value.var = "responseValue")
rTime <- dcast(newDF, respondent ~ question, value.var = "responseTime")
names(rTime)[2:length(names(rTime))] <- paste0("TimeA", seq(1,length(names(rTime))-1,1) )
finalDF <- cbind(qTime, rValue[,-1], rTime[,-1])
finalDF
# respondent TimeQ1 TimeQ2 TimeQ3 Question1 Question2 Question3 TimeA1 TimeA2 TimeA3
# Respondent 1 446.6204 452.7516 458.1999 4 3 5 452.7516 458.1999 460.2342
# Respondent 2 535.9445 543.3019 549.8399 5 2 3 543.3019 549.8399 552.2810
# Respondent 3 443.2204 449.3516 454.7999 1 4 5 449.3516 454.7999 456.8342
如果確實需要,則必須弄亂列順序,但是通常應該這樣做。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.