如何使用ggplot2中的多個變量更好地創建堆積條形圖？

Question

我經常需要制作堆積的條形圖來比較變量，因為我在R中完成所有的統計數據，所以我更喜歡使用ggplot2來完成R中的所有圖形。 我想學習如何做兩件事：

首先，我希望能夠為每個變量添加適當的百分比刻度標記，而不是按計數添加刻度標記。 計數會令人困惑，這就是我完全取出軸標簽的原因。

其次，必須有一種更簡單的方法來重組我的數據以實現這一目標。 看起來我應該能夠在ggplot2中使用plyR進行本地操作，但是plyR的文檔不是很清楚（我已經閱讀了ggplot2書和在線plyR文檔。

我最好的圖表看起來像這樣，創建它的代碼如下：

示例圖

我用來獲取它的R代碼如下：

library(epicalc)  

### recode the variables to factors ###
recode(c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ), c(1,2,3,4,5,6,7,8,9, NA), 
c('Very Interested','Somewhat Interested','Not Very Interested','Not At All interested',NA,NA,NA,NA,NA,NA))

### Combine recoded variables to a common vector
Interest1<-c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ)


### Create a second vector to label the first vector by original variable ###  
a1<-rep("News about Bangladesh", length(int_newcoun))
a2<-rep("Neighboring Countries", length(int_newneigh))
[...]
a17<-rep("Education", length(int_educ))


Interest2<-c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17)

### Create a Weighting vector of the proper length ###
Interest.weight<-rep(weight, 17)

### Make and save a new data frame from the three vectors ###
Interest.df<-cbind(Interest1, Interest2, Interest.weight)
Interest.df<-as.data.frame(Interest.df)

write.csv(Interest.df, 'C:\\Documents and Settings\\[name]\\Desktop\\Sweave\\InterestBangladesh.csv')

### Sort the factor levels to display properly ###

Interest.df$Interest1<-relevel(Interest$Interest1, ref='Not Very Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Somewhat Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Very Interested')

Interest.df$Interest2<-relevel(Interest$Interest2, ref='News about Bangladesh')
Interest.df$Interest2<-relevel(Interest$Interest2, ref='Education')
[...]
Interest.df$Interest2<-relevel(Interest$Interest2, ref='European Politics')

detach(Interest)
attach(Interest)

### Finally create the graph in ggplot2 ###

library(ggplot2)
p<-ggplot(Interest, aes(Interest2, ..count..))
p<-p+geom_bar((aes(weight=Interest.weight, fill=Interest1)))
p<-p+coord_flip()
p<-p+scale_y_continuous("", breaks=NA)
p<-p+scale_fill_manual(value = rev(brewer.pal(5, "Purples")))
p
update_labels(p, list(fill='', x='', y=''))

我非常感謝任何提示，技巧或提示。

Answer 1

你不需要prop.tables或count等來做100％疊加的條形圖。 你只需要+geom_bar(position="stack")

Answer 2

您的第二個問題可以通過重塑包裝中的熔化和鑄造來解決

在您調度data.frame中的元素后，您可以使用以下內容：

install.packages("reshape")
library(reshape)

x <- melt(your.df, c()) ## Assume you have some kind of data.frame of all factors
x <- na.omit(x) ## Be careful, sometimes removing NA can mess with your frequency calculations

x <- cast(x, variable + value ~., length)
colnames(x) <- c("variable","value","freq")
## Presto!
ggplot(x, aes(variable, freq, fill = value)) + geom_bar(position = "fill") + coord_flip() + scale_y_continuous("", formatter="percent")

順便說一句，我喜歡使用grep從凌亂的導入中提取列。 例如：

x <- your.df[,grep("int.",df)] ## pulls all columns starting with "int_"

當你不必輸入c（''，...）一百萬次時，分解會更容易。

for(x in 1:ncol(x)) { 
df[,x] <- factor(df[,x], labels = strsplit('
Very Interested
Somewhat Interested
Not Very Interested
Not At All interested
NA
NA
NA
NA
NA
NA
', '\n')[[1]][-1]
}

Answer 3

關於..count..百分比，嘗試：

ggplot(mtcars, aes(factor(cyl), prop.table(..count..) * 100)) + geom_bar()

但是因為將函數推入aes()並不是一個好主意，所以你可以編寫自定義函數來創建..count..百分比，將它舍入為n小數等。

你用plyr標記了這篇文章，但我沒有看到任何plyr在這里行動，我打賭一個ddply()可以完成這項工作。 在線plyr文檔應該足夠了。

Answer 4

如果我正確理解您，要修復軸標簽問題，請進行以下更改：

# p<-ggplot(Interest, aes(Interest2, ..count..))
p<-ggplot(Interest, aes(Interest2, ..density..))

至於第二個，我認為你最好使用重塑包。 您可以使用它非常輕松地將數據聚合到組中。

在下面參考aL3xa的評論......

library(ggplot2)
r<-rnorm(1000)
d<-as.data.frame(cbind(r,1:1000))
ggplot(d,aes(r,..density..))+geom_bar()

返回...

alt text http://www.drewconway.com/zia/wp-content/uploads/2010/04/density.png

垃圾箱現在是密度......

Answer 5

你的第一個問題：這會有幫助嗎？

geom_bar(aes(y=..count../sum(..count..)))

你的第二個問題; 你可以使用重新排序來排序吧？ 就像是

aes(reorder(Interest, Value, mean), Value)

（剛從七小時車程回來 - 累了 - 但我想它應該有效）

如何使用ggplot2中的多個變量更好地創建堆積條形圖？

問題描述

5 個解決方案

解決方案1
2 2010-09-03 12:28:36

解決方案2
2 2010-09-24 07:01:25

解決方案3
1 2010-04-05 19:32:59

解決方案4
1 2010-04-05 19:37:44

解決方案5
1 2010-04-06 19:31:18

如何使用ggplot2中的多個變量更好地創建堆積條形圖？

問題描述

5 個解決方案

解決方案1 2 2010-09-03 12:28:36

解決方案2 2 2010-09-24 07:01:25

解決方案3 1 2010-04-05 19:32:59

解決方案4 1 2010-04-05 19:37:44

解決方案5 1 2010-04-06 19:31:18

解決方案1
2 2010-09-03 12:28:36

解決方案2
2 2010-09-24 07:01:25

解決方案3
1 2010-04-05 19:32:59

解決方案4
1 2010-04-05 19:37:44

解決方案5
1 2010-04-06 19:31:18