简体   繁体   English

可视化R中posthoc Tukey的临界值/成对比较

[英]Visualize critical values / pairwise comparisons from posthoc Tukey in R

I'm trying to get a fine-grain visualisation of critical values I got from posthoc Tukey. 我试图从posthoc Tukey获得关键值的细粒度可视化。 There are some good guidelines out there for visualizing pairwise comparisons, but I need something more refined. 有一些很好的指导方针可用于可视化成对比较,但我需要更精确的东西。 The idea is that I would have a plot where each small square would represent a critical value from the matrix below, coded in such manner that: 我的想法是,我将得到一个图,其中每个小方块将代表下面矩阵的临界值,编码方式如下:

  • if the value is higher or equal to 5.45 - it's a black square; 如果该值高于或等于5.45 - 它是一个黑色方块;
  • if the value is lower or equal to -5.45 - it's a gray square; 如果该值低于或等于-5.45 - 则为灰色方块;
  • if the value is between -5.65 and 5.65 - it's a white square. 如果该值介于-5.65和5.65之间 - 它是一个白色方块。

The data matrix is here . 数据矩阵在这里

Or maybe you would have better suggestion how to visualize those critical values? 或者你可能会更好地建议如何将这些关键值可视化?

EDIT: Following comments from @Aaron and @DWin I want to provide a bit more context for the above data and justification for my question. 编辑:根据@Aaron和@DWin的评论,我想为上述数据提供更多的背景,并为我的问题提供理由。 I am looking at the mean ratings of acceptability for seven virtual characters, each of them is animated on 5 different levels. 我正在研究七个虚拟角色的可接受性的平均评级,每个虚拟角色在5个不同的级别上进行动画制作。 So, I have two factors there - character (7 levels) and motion (5 levels). 所以,我有两个因素 - 角色(7个级别)和动作(5个级别)。 Because I have found interaction between those two factors, I decided to look at differences between the means for all the characters for all levels of motion , which resulted in this massive matrix, as an output of posthoc Tukey. 因为我发现了这两个因素之间的相互作用,所以我决定研究所有运动水平的所有角色的均值之间的差异,这导致了这个庞大的矩阵,作为posthoc Tukey的输出。 It's probably too much detail now, but please don't throw me out to Cross Validated, they will eat me alive... 现在可能太详细了,但请不要把我扔给Cross Validated,他们会活着吃我...

This is fairly straightforward with image : image相当简单:

d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))    
image(x=1:35, y=1:35, as.matrix(d), breaks=c(min(d), -5.45, 5.45, max(d)), 
      col=c("grey", "white", "black"))

For just half, set half to missing with d[upper.tri(d)] <- NA and add na.rm=TRUE to the min and max functions. 只有一半,用d[upper.tri(d)] <- NA设置一半丢失,并将na.rm=TRUE添加到minmax函数。

在此输入图像描述

Here is a ggplot2 solution. 这是一个ggplot2解决方案。 I'm sure there are simpler ways to accomplish this -- I guess I got carried away! 我确信有更简单的方法可以做到这一点 - 我想我已经被带走了!

library(ggplot2)

# Load data.
postH = read.table("~/Downloads/postH.dat")
names(postH) = paste("item", 1:35, sep="") # add column names.
postH$item_id_x = paste("item", 1:35, sep="") # add id column.

# Convert data.frame to long form.
data_long = melt(postH, id.var="item_id_x", variable_name="item_id_y")

# Convert to factor, controlling the order of the factor levels.
data_long$item_id_y = factor(as.character(data_long$item_id_y), 
                        levels=paste("item", 1:35, sep=""))
data_long$item_id_x = factor(as.character(data_long$item_id_x), 
                        levels=paste("item", 1:35, sep=""))

# Create critical value labels in a new column.
data_long$critical_level = ifelse(data_long$value >= 5.45, "high",
                             ifelse(data_long$value <= -5.65, "low", "middle"))

# Convert to labels to factor, controlling the order of the factor levels.
data_long$critical_level = factor(data_long$critical_level,
                                  levels=c("high", "middle", "low"))

# Named vector for ggplot's scale_fill_manual
critical_level_colors = c(high="black", middle="grey80", low="white")

# Calculate grid line positions manually.
x_grid_lines = seq(0.5, length(levels(data_long$item_id_x)), 1)
y_grid_lines = seq(0.5, length(levels(data_long$item_id_y)), 1)

# Create plot.
plot_1 = ggplot(data_long, aes(xmin=as.integer(item_id_x) - 0.5,
                               xmax=as.integer(item_id_x) + 0.5,
                               ymin=as.integer(item_id_y) - 0.5,
                               ymax=as.integer(item_id_y) + 0.5,
                               fill=critical_level)) +
     theme_bw() +
     opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
     coord_cartesian(xlim=c(min(x_grid_lines), max(x_grid_lines)),
                     ylim=c(min(y_grid_lines), max(y_grid_lines))) +
     scale_x_continuous(breaks=seq(1, length(levels(data_long$item_id_x))),
                        labels=levels(data_long$item_id_x)) +
     scale_y_continuous(breaks=seq(1, length(levels(data_long$item_id_x))),
                        labels=levels(data_long$item_id_y)) +
     scale_fill_manual(name="Critical Values", values=critical_level_colors) +
     geom_rect() +
     geom_hline(yintercept=y_grid_lines, colour="grey40", size=0.15) +
     geom_vline(xintercept=x_grid_lines, colour="grey40", size=0.15) +
     opts(axis.text.y=theme_text(size=9)) +
     opts(axis.text.x=theme_text(size=9, angle=90)) +
     opts(title="Critical Values Matrix")

# Save to pdf file.
pdf("plot_1.pdf", height=8.5, width=8.5)
print(plot_1)
dev.off()

在此输入图像描述

If you set this up with findInterval as an index into the bg , col , and/or pch arguments (although they are all squares at the moment), you should find the code fairly compact and understandable. 如果你用findInterval作为bgcol和/或pch参数的索引来设置它(虽然它们目前都是正方形),你应该发现代码相当紧凑且易于理解。

You'll need to get the data in long format first; 您需要先以长格式获取数据; here's one way: 这是一种方式:

d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
dat <- within(as.data.frame(as.table(d)), 
              { Var1 <- as.numeric(Var1)  
                Var2 <- as.numeric(Var2) })

Then the code is as follows; 然后代码如下; pch=22 uses filled squares, bg sets the fill color of the square, col sets the border color, and cex=1.5 just makes them a little bigger than the default. pch=22使用实心方块, bg设置方形的填充颜色, col设置边框颜色, cex=1.5只是使它们比默认值大一点。

plot(dat$Var1, dat$Var2, 
     bg = c("grey", "white", "black")[1+findInterval(dat$Freq, c(-5.45,5.45))],
     col="white", cex=1.5, pch = 22)

You need the 1+ in there because the values would be 0,1,2 and your indices need to start with 1. 你需要1+ ,因为值为0,1,2而你的指数需要从1开始。

在此输入图像描述

To make a closure here I used majority of suggestions from @DWin and @Aaron to create the plot below. 为了解决这个问题,我使用了@DWin和@Aaron的大部分建议来创建下面的情节。 The lightest level of gray stands for non-significant values. 最轻的灰色代表非重要值。 I also used rect to create lines above axis names to better differentiate between conditions: 我还使用rect在轴名称上创建线条以更好地区分条件:

d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
#remove upper half of the values (as they are mirrored values)
d[upper.tri(d)] <- NA
dat <- within(as.data.frame(as.table(d)),{
Var1 <- as.numeric(Var1)
Var2 <- as.numeric(Var2)})
par(mar=c(6,3,3,6))
colPh=c("gray50","gray90","black")
plot(dat$Var1,dat$Var2,bg = colPh[1+findInterval(dat$Freq, c(-5.45,5.45))],
    col="white",cex=1.2,pch = 21,axes=F,xlab="",ylab="")
labDis <- rep(c("A","B","C","D","E"),times=7)
labChar <- c(1:7)
axis(1,at=1:35,labels=labDis,cex.axis=0.5,tick=F,line=-1.4)
axis(1,at=seq(3,33,5),labels=labChar, tick=F)
#drawing lines above axis for better identification
rect(1,0,5,0,angle=90);rect(6,0,10,0,angle=90);rect(11,0,15,0,angle=90);
rect(16,0,20,0,angle=90);rect(21,0,25,0,angle=90);rect(26,0,30,0,angle=90);
rect(31,0,35,0,angle=90)
axis(4,at=1:35,labels=labDis,cex.axis=0.5,tick=F,line=-1.4)
axis(4,at=seq(3,33,5),labels=labChar,tick=F)
#drawing lines above axis for better identification
rect(36,1,36,5,angle=90);rect(36,6,36,10,angle=90);rect(36,11,36,15,angle=90);
rect(36,16,36,20,angle=90);rect(36,21,36,25,angle=90);rect(36,26,36,30,angle=90);
rect(36,31,36,35,angle=90)
legend("topleft",legend=c("not significant","p<0.01","p<0.05"),pch=16,
col=c("gray90","gray50","black"),cex=0.7,bty="n")

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM