简体   繁体   English

如何为百分比的多个变量创建堆积条形图

[英]how to create stacked bar charts for multiple variables with percentages

I am trying to create a stacked bar chart with multiple variables, but I am stuck on two issues: 我正在尝试创建具有多个变量的堆积条形图,但是我遇到了两个问题:

1) I can't seem to get the rotated y-axis to display percentages instead of counts. 1)我似乎无法让旋转的y轴显示百分比而不是计数。

2) I would like to sort the variables (desc) based on the percentage of the "strongly agree" response. 2)我想根据“强烈同意”响应的百分比对变量(desc)进行排序。

Here is an example of what I have so far: 这是我到目前为止的一个例子:

require(scales)
require(ggplot2)
require(reshape2)

# create data frame
  my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
  my.df$id <- seq(1, 200, by = 1)

# melt
  melted <- melt(my.df, id.vars="id")

# factors
  melted$value <- factor(melted$value, 
                         levels=c(1,2,3,4),
                         labels=c("strongly disagree", 
                                  "disagree", 
                                  "agree", 
                                  "strongly agree"))
# plot
  ggplot(melted) + 
    geom_bar(aes(variable, fill=value, position="fill")) +
    scale_fill_manual(name="Responses",
                      values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
                               "#2171B5"),
                      breaks=c("strongly disagree", 
                               "disagree", 
                               "agree", 
                               "strongly agree"),
                      labels=c("strongly disagree", 
                               "disagree", 
                               "agree", 
                               "strongly agree")) +
    labs(x="Items", y="Percentage (%)", title="my title") +
    coord_flip()

I owe thanks to several folks for help in getting this far. 我要感谢几个人的帮助。 Here are a few of the many pages that Google served up: 以下是Google提供的众多网页中的一小部分:

http://www.r-bloggers.com/fumblings-with-ranked-likert-scale-data-in-r/ http://www.r-bloggers.com/fumblings-with-ranked-likert-scale-data-in-r/

Create stacked barplot where each stack is scaled to sum to 100% 创建堆叠的条形图,其中每个堆叠的缩放比例总计为100%

sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape2_1.2.2  ggplot2_0.9.2.1 scales_0.2.2   

loaded via a namespace (and not attached):
 [1] colorspace_1.2-0    dichromat_1.2-4     digest_0.6.0        grid_2.15.0         gtable_0.1.1        HH_2.3-23          
 [7] labeling_0.1        lattice_0.20-10     latticeExtra_0.6-24 MASS_7.3-22         memoise_0.1         munsell_0.4        
[13] plyr_1.7.1          proto_0.3-9.2       RColorBrewer_1.0-5  rstudio_0.97.237    stringr_0.6.1       tools_2.15.0       

Since you are working with Likert data, you might want to consider the likert() function in package HH. 由于您正在使用Likert数据,因此您可能需要考虑包HH中的likert()函数。 (Hopefully it is ok to point you in another direction given that there is already a nice answer addressing your original ggplot2 approach.) (希望你可以指出另一个方向,因为已经有一个很好的答案解决了你原来的ggplot2方法。)

As one might hope, likert() plots in a likert-appropriate way with minimal struggle. 正如人们所希望的那样, likert()以一种恰当的方式绘制,并且最小化。 PositiveOrder=TRUE will sort the items by how far they extend in the positive direction. PositiveOrder=TRUE将按项目在正方向上延伸的距离对项目进行排序。 The ReferenceZero argument will allow you to zero-center through the middle of a neutral item (not needed below but shown here ). ReferenceZero参数将允许您通过中性项目的中间零中心(下面不需要,但在此处显示 )。 And as.percent=TRUE will convert counts into percents and list the actual counts in the margin (unless we turn that off). 并且as.percent=TRUE会将计数转换为百分数并列出保证金中的实际计数(除非我们将其关闭)。

library(reshape2)
library(HH)

# create data as before
my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
my.df$id <- seq(1, 200, by = 1)

# melt() and dcast() with reshape2 package
melted <- melt(my.df,id.var="id", na.rm=TRUE)
summd <- dcast(data=melted,variable~value, length) # note: length()
                                                   # not robust if NAs present

# give names to cols and rows for likert() to use
names(summd) <- c("Question", "strongly disagree", 
                              "disagree", 
                              "agree", 
                              "strongly agree")
rownames(summd) <- summd[,1]  # question number as rowname
summd[,1] <- NULL             

# plot
likert(summd,
       as.percent=TRUE,       # automatically scales
       main = NULL,           # or give "title",
       xlab = "Percent",      # label axis
       positive.order = TRUE, # orders by furthest right
       ReferenceZero = 2.5,   # zero point btwn levels 2&3
       ylab = "Question",     # label for left side
       auto.key = list(space = "right", columns = 1,
                     reverse = TRUE)) # make positive items on top of legend

在此输入图像描述

For (1) 对于(1)
To get percentages, you'll have to create a data.frame from melted . 要获得百分比,您必须从melted创建data.frame At least that's the way I could think of. 至少那是我能想到的方式。

# 200 is the total sum always. Using that to get the percentage
require(plyr)
df <- ddply(melted, .(variable, value), function(x) length(x$value)/200 * 100)

Then supply the calculated percentages as weights in geom_bar as follows: 然后在geom_bar提供计算出的百分比作为weights ,如下所示:

ggplot(df) + 
geom_bar(aes(variable, fill=value, weight=V1, position="fill")) +
scale_fill_manual(name="Responses",
                  values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
                           "#2171B5"),
                  breaks=c("strongly disagree", 
                           "disagree", 
                           "agree", 
                           "strongly agree"),
                  labels=c("strongly disagree", 
                           "disagree", 
                           "agree", 
                           "strongly agree")) +
labs(x="Items", y="Percentage (%)", title="my title") +
coord_flip()

I don't quite understand (2). 我不太明白(2)。 Do you want to (a) calculate relative percentages (with reference as "strongly agree"? Or (b) do you want always the plot to display "strongly agree", then "agree", etc.. You can accomplish (b) by just reordering factors in df by, 你想(a)计算relative percentages (参考为“非常同意”吗?或者(b)你是否希望情节总是显示“非常同意”,然后“同意”等等。你可以完成(b)通过重新排序df中的因子,

df$value <- factor(df$value, levels=c("strongly agree", "agree", "disagree", 
                 "strongly disagree"), ordered = TRUE)

Edit: You can reorder the levels of variable and value to the order you require as follows: Edit:您可以将variablevalue的级别重新排序为您需要的顺序,如下所示:

variable.order <- names(sort(daply(df, .(variable), 
                     function(x) x$V1[x$value == "strongly agree"] ), 
                     decreasing = TRUE))
value.order <- c("strongly agree", "agree", "disagree", "strongly disagree")
df$variable <- factor(df$variable, levels = variable.order, ordered = TRUE)
df$value <- factor(df$value, levels = value.order, ordered = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM