[英]How can I reshape data frame by grouping certain columns
假設我有一個包含5列的R數據框,如下所示
time MeanVar1 SdVar1 MedianVar1 MeanVar2 SdVar2
1 -0.8453978 -1.636985 -0.6239832 -0.4366982 -1.7037374
2 -0.3000778 -1.034199 0.3292459 -0.6606399 -0.1525361
是否有一個簡潔的方法來使dataFrame如下:
Var time Mean/Median SD
1 1 -0.8453978 -1.636985
1 2 -0.3000778 -1.034199
1 1 -0.6239832 N/A
1 2 0.3292459 N/A
2 1 -0.4366982 -1.7037374
2 2 -0.6606399 -0.1525361
要么
Var time Mean/Median SD
MeanVar1 1 -0.8453978 -1.636985
MeanVar1 2 -0.3000778 -1.034199
MeanVar1 1 -0.6239832 N/A
MeanVar1 2 0.3292459 N/A
MeanVar2 1 -0.4366982 -1.7037374
MeanVar2 2 -0.6606399 -0.1525361
我的總體意圖是在同一圖中繪制變量1的均值,SD與變量1,變量1的中值和平均值,變量1的SD。 因此,我覺得如果我以這種格式修改數據,我可以立即繪制它而不是分別繪制每一行。
由於我對Reshape和融化的知識有限,我無法做到這一點。
編輯:添加更多信息
樣本輸入(給定3行,共有100行):
Label trainingSize Accuracy_Mean Accuracy_SD Accuracy_SE Precision_Mean Recall_Mean F1 Accuracy_Median PriorClass0_Mean PriorClass0_SD PriorClass0_SE ProbabilityEstimate_0given0_Mean ProbabilityEstimate_0given0_SD ProbabilityEstimate_0given0_SE ProbabilityEstimate_0given1_Mean ProbabilityEstimate_0given1_SD ProbabilityEstimate_0given1_SE
0perc_0repeat 0.4 0.5506 0.0531 0.0038 0.6374 0.2336 0.3419 0.5372 0.5278 0.0254 0.0018 0.6433 0.0028 0.0 0.4169 0.003 0.0
0perc_0repeat 0.4 0.5456 0.0482 0.0034 0.6465 0.2142 0.3218 0.5333 0.5304 0.0248 0.0018 0.6414 0.0028 0.0 0.4193 0.0027 0.0
0perc_0repeat 0.4 0.5574 0.0555 0.0039 0.6604 0.2197 0.3297 0.5404 0.529 0.0233 0.0016 0.6436 0.003 0.0 0.4163 0.0029 0.0
我試圖策划
1) the iteration number(1:100) in X Axis and the points of 5 columns (Accuracy_Mean, Accuracy_Median, PriorClass0_Mean, ProbabilityEstimate_0given0_Mean, ProbabilityEstimate_0given1_Mean in the Y AXIS. 2) distribution (density obtained by 100 points) of 5 columns with error bars (either SD or SE) in a single plot using ggplot.
我有4列Precision_Mean,Recall_Mean,F1,Accuracy_Median不遵循均值,sd,se模式!
編輯1:1)
dput(droplevels(head(data,3)))結構(list(標簽=結構(c(1L,1L,1L),。Label =“0perc_0repeat”,class =“factor”),trainingSize = c(0.4,0.4 ,0.4),Accuracy_Mean = c(0.5506,0.5456,0.5574),Accuracy_SD = c(0.0531,0.0482,0.0555),Accuracy_SE = c(0.0038,0.0034,0.0039),Precision_Mean = c(0.6374,0.6646,0.6604),Recall_Mean = c(0.2336,0.2142,0.2197),F1 = c(0.3419,0.3218,0.3297),Accuracy_Median = c(0.5372,0.5333,0.5404),PriorClass0_Mean = c(0.5278,0.5304,0.529),PriorClass0_SD = c(0.0254,0.0248, 0.0233),PriorClass0_SE = c(0.0018,0.0018,0.0016),ProbabilityEstimate_0given0_Mean = c(0.6433,0.6414,0.6646),ProbabilityEstimate_0given0_SD = c(0.0028,0.0028,0.003),ProbabilityEstimate_0given0_SE = c(0,0,0),ProbabilityEstimate_0given1_Mean = c (0.4169,0.4193,0.4163),ProbabilityEstimate_0given1_SD = c(0.003,0.0027,0.0029),ProbabilityEstimate_0given1_SE = c(0,0,0)),. Name = c(“label”,“trainingSize”,“Accuracy_Mean”,“Accuracy_SD “,”Accuracy_SE“,”Prec ision_Mean”, “Recall_Mean”, “F1”, “Accuracy_Median”, “PriorClass0_Mean”, “PriorClass0_SD”, “PriorClass0_SE”, “ProbabilityEstimate_0given0_Mean”, “ProbabilityEstimate_0given0_SD”, “ProbabilityEstimate_0given0_SE”, “ProbabilityEstimate_0given1_Mean”, “ProbabilityEstimate_0given1_SD”, “ProbabilityEstimate_0given1_SE” ),row.names = c(NA,3L),class =“data.frame”)
2)預期輸出類似於:
Vars Label trainingSize Mean SD SE
Vars:Mean,PriorClass0,ProbabilityEstimate_0given0,ProbabilityEstimate_0given1; (中位數,精度,召回,F1不是必需的,或者它們可以適合上表,SD,SE為N / A或0)。
merged.stack
從我的“splitstackshape”包處理這在一定程度上,但它會從你的“SdVar”回收列中的值(所以我沒有得到的NA
值在您需要的輸出顯示)。
盡管如此,它可能是解決問題的開始,所以這是方法:
library(splitstackshape)
merged.stack(mydf, var.stubs = c("MeanVar|MedianVar", "SdVar"), sep = "var.stubs")
# time .time_1 MeanVar|MedianVar SdVar
# 1: 1 1 -0.8453978 -1.6369850
# 2: 1 1 -0.6239832 -1.6369850
# 3: 1 2 -0.4366982 -1.7037374
# 4: 2 1 -0.3000778 -1.0341990
# 5: 2 1 0.3292459 -1.0341990
# 6: 2 2 -0.6606399 -0.1525361
如果你真的想要那些NA
值,也許這樣就可以了:
merged.stack(
mydf, var.stubs = c("MeanVar|MedianVar", "SdVar"),
sep = "var.stubs")[, SdVar := ifelse(
duplicated(SdVar), NA, SdVar), by = time][]
# time .time_1 MeanVar|MedianVar SdVar
# 1: 1 1 -0.8453978 -1.6369850
# 2: 1 1 -0.6239832 NA
# 3: 1 2 -0.4366982 -1.7037374
# 4: 2 1 -0.3000778 -1.0341990
# 5: 2 1 0.3292459 NA
# 6: 2 2 -0.6606399 -0.1525361
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.