簡體   English   中英

數據框重新設計:將2行合並為一行-並按值重命名

[英]Data Frame redesign : merge 2 row in one single row - and rename by value

我是R的新手。為了找到這個問題的理想答案,我做了很多研究和測試。 我試過重塑,t,融化等。我也在為變量的名稱苦苦掙扎。 我陷入這樣的數據框架。 我們有時間問問題(在問題1之前),然后在第二行有時間記錄答案。

    Time            Logs
    446.6204    Question1
    452.7516    4
    452.7516    Question2
    458.1999    3
    458.1999    Question3
    460.2342    5

我想將所有內容都放在一行上,並用“日志”中的值命名該變量。 運氣對我來說,模式是恆定的,所以切片很不錯。

Respondent TimeQ1   Question1   TimeA1  TimeQ2  Question2   TimeA2  TimeQ3  Question3   TimeA3
Respondent1 446.6204    4   452.7516    452.7516    3   458.1999    458.1999    5   460.2342

謝謝你的幫助!

我為受訪者添加了一個列,並為多個受訪者添加了數據。 這是示例數據集:

DF <- structure(list(Respondent = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Respondent 1", 
"Respondent 2", "Respondent 3"), class = "factor"), Time = c(446.6204, 
452.7516, 452.7516, 458.1999, 458.1999, 460.2342, 535.94448, 
543.30192, 543.30192, 549.83988, 549.83988, 552.28104, 443.2204, 
449.3516, 449.3516, 454.7999, 454.7999, 456.8342), Logs = structure(c(6L, 
4L, 7L, 3L, 8L, 5L, 6L, 5L, 7L, 2L, 8L, 3L, 6L, 1L, 7L, 4L, 8L, 
5L), .Label = c("1", "2", "3", "4", "5", "Question1", "Question2", 
"Question3"), class = "factor")), .Names = c("Respondent", "Time", 
"Logs"), row.names = c(NA, -18L), class = "data.frame")

我不認為將所有數據都放在一條線上是您的最佳選擇。 如果您有很多問題,那么您的電話線會很長。

這是我之前建議的格式(我仍然認為更好):

 newDF <- data.frame(respondent = DF$Respondent[grep("Question", DF$Logs)],
                question = as.character(DF$Logs[grep("Question", DF$Logs)]),
                questionTime = DF$Time[grep("Question", DF$Logs)],
                responseValue = DF$Logs[-grep("Question", DF$Logs)],
                responseTime = DF$Time[-grep("Question", DF$Logs)])
newDF

 #   respondent  question questionTime responseValue responseTime
 # Respondent 1 Question1     446.6204             4     452.7516
 # Respondent 1 Question2     452.7516             3     458.1999
 # Respondent 1 Question3     458.1999             5     460.2342
 # Respondent 2 Question1     535.9445             5     543.3019
 # Respondent 2 Question2     543.3019             2     549.8399
 # Respondent 2 Question3     549.8399             3     552.2810
 # Respondent 3 Question1     443.2204             1     449.3516
 # Respondent 3 Question2     449.3516             4     454.7999
 # Respondent 3 Question3     454.7999             5     456.8342

編輯

基於被訪者還有一列的事實,您可以使用dcast東西將上面我的表格中的數據放入您要查找的內容中。 步驟如下:

 qTime <- dcast(newDF, respondent ~ question, value.var = "questionTime")
names(qTime)[2:length(names(qTime))] <- paste0("TimeQ", seq(1,length(names(qTime))-1,1) )

rValue <- dcast(newDF, respondent ~ question, value.var = "responseValue")

rTime <- dcast(newDF, respondent ~ question, value.var = "responseTime")
names(rTime)[2:length(names(rTime))] <- paste0("TimeA", seq(1,length(names(rTime))-1,1) )

finalDF <- cbind(qTime, rValue[,-1], rTime[,-1])

finalDF

#     respondent   TimeQ1   TimeQ2   TimeQ3 Question1 Question2 Question3   TimeA1   TimeA2   TimeA3
#   Respondent 1 446.6204 452.7516 458.1999         4         3         5 452.7516 458.1999 460.2342
#   Respondent 2 535.9445 543.3019 549.8399         5         2         3 543.3019 549.8399 552.2810
#   Respondent 3 443.2204 449.3516 454.7999         1         4         5 449.3516 454.7999 456.8342

如果確實需要,則必須弄亂列順序,但是通常應該這樣做。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM