简体   繁体   中英

Percentage of bin total on y-axis with facet_wrap and time series on x-axis

I am investigating a dataset with loan information from Prosper, specifically investor behavior.

The plot I would like to create would show investors on the y axis, and time on the x axis, binned to the average month. This would also be faceted by a Credit Grade. Ultimately, I would like each bin to show what percentage of total investors were allocated to each Credit Grade (the facet variable), per calculated month (or actual month, but calculated seems easier for binning).

I have tried ..density.. , ..count.. / sum(..count..) , geom_density , etc and seen plenty of posts that will sum each facet to 1 or the entire plot to 1. To re-iterate I am trying to sum each bin, among all the facets, to 1. I was also hoping to do this directly in ggplot, rather than alter the dataframe, but I'll take what I can get.

The following code shows two ways to display the investor counts (count per bin and percentage of entire plot per bin):

t1 <- ggplot(data = loans, aes(x=as.POSIXct(strptime(LoanOriginationDate, '%Y-%m-%d %H:%M:%S')))) + 
  geom_histogram(binwidth = 60*60*24*30.4375, aes(y = ..count../sum(..count..), group = Investors)) +
  facet_wrap(~ProsperCreditGrade) +
  scale_y_continuous()

t2 <- ggplot(loans,aes(x=as.POSIXct(strptime(LoanOriginationDate, '%Y-%m-%d %H:%M:%S')),fill=ProsperCreditGrade))+
  geom_histogram(aes(y=2629800* ..count../sum(..count..)),
                 alpha=1,position='identity',binwidth=2629800) +
  facet_wrap(~ProsperCreditGrade) +
  stat_bin(aes(y = ..density..))

grid.arrange(t1,t2,ncol=1)

每箱投资者数

As you can see in the plot, total investors went up quite a bit toward the end of the time covered in the dataset. This does not show relative investment behavior over a given time, which is what I am trying to investigate.

What else can I try?

With help from Stephen of Udacity.com and dplyr , the final code is as follows:

loans$month <- month(as.POSIXct((round(as.numeric(as.POSIXct(loans$LoanOriginationDate))/2629800)*2629800), origin = "1969-12-31 19:00:00"))

loans$year <- year(as.POSIXct((round(as.numeric(as.POSIXct(loans$LoanOriginationDate))/2629800)*2629800), origin = "1969-12-31 19:00:00"))

loans$calculatedMonth <- ((loans$year-2005)*12)+loans$month

loanInvestors <- loans %>% group_by(calculatedMonth, ProsperCreditGrade) %>% summarise (n = n()) %>% mutate(proportion = n / sum(n))

ggplot(data = loanInvestors, aes(x = calculatedMonth, y = proportion, fill = proportion, width = 3)) +
  geom_bar(stat = "identity") + facet_wrap(~ProsperCreditGrade) +
  scale_y_sqrt() + geom_smooth(color = "red") +
  scale_fill_gradient()

Investors per quarter by Credit Grade

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM