简体   繁体   中英

R Grouped Bar Plots with Conditions

I am trying to compare two variables and create a grouped bar graph based on their correlations. The Churn column is either "Yes" or "No". The Contract column can be either "Month-to-Month", "One Year", or "Two Years". What I ultimately want is a grouped bar graph that has the total number of Yeses and Nos for each Contract type. Example being that the Month-to-Month contract type has 2220 Nos in the Churn column and 1655 Yeses.

I have to compare Churn to two other columns of similar nature, so at first I was trying to make a function that looped through the levels of each column, pulled the information, and dumped it into a vector but then started reading that appending to vectors in loops for R was not best practice.

So I went the long way about it with this:

contractLevels = levels(cd$Contract)
c1n = length(cd$Contract[which(cd$Churn == "No" & cd$Contract == contractLevels[1])])
c1y = length(cd$Contract[which(cd$Churn == "Yes" & cd$Contract == contractLevels[1])])
c2n = length(cd$Contract[which(cd$Churn == "No" & cd$Contract == contractLevels[2])])
c2y = length(cd$Contract[which(cd$Churn == "Yes" & cd$Contract == contractLevels[2])])
c3n = length(cd$Contract[which(cd$Churn == "No" & cd$Contract == contractLevels[3])])
c3y = length(cd$Contract[which(cd$Churn == "Yes" & cd$Contract == contractLevels[3])])
cv <- c(c1n, c1y, c2n, c2y, c3n, c3y)
cn <- c(paste(contractLevels[1], "No"), paste(contractLevels[1], "Yes"), paste(contractLevels[2], "No"), paste(contractLevels[2], "Yes"), paste(contractLevels[3], "No"), paste(contractLevels[3], "Yes"))

I still wanted to make it as easy as possible to reuse so I didn't type out the actual new column names (cn). First of all, there has to be an easier way to do what is above and I'm just too much of an R noobie to figure it out. Secondly, I can't get it to be a grouped bar graph with this data. I was trying to follow this: http://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2/ but since the cn and cv vectors do not have 7032 "rows" (like my data does), it doesn't work.

Is it possible to say: Graph the total number of times each level of column X says "Yes" in column Y beside the total number of times it says "No" in column Y for each of these levels. I have been playing with rpart, plot, and ggplot trying to figure this out.

Just doing plot(cd$Contract, cd$Churn) gives me a stacked graph that is kinda what I want, except is kind of hard to read. Doing barplot(cv, ylab="Churn", names.arg=cn, cex.names=0.5, las=2) gives me the bar chart that isn't grouped and is also a bit hard to read. 堆叠图


I think the best course of action for you is to create a new vector with just the sums you want to display. Create another vector with the bars names in correct order and add the two to a data frame. Then use the grouped method from the source you provided. If you take the example from there then: Condition will become ("yes","no","yes","no","yes","no") Species will become contract type And value is the sum you want to display. This new data frame will work with the given example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM