I'm new to R, so maybe I'm overlooking something basic here. I'm having an issue where my box/whisker plot with jitter is duplicating certain data points.
First I'm creating two subsets of data as pulled from a database
postProgram<-subset(sData,sData$PROGRAM_STATUS=="POST-PROGRAM" & sData$CLASS_DESC=="Widget1" & MARKET_NAME!="N/A")
preProgram<-subset(sData,sData$PROGRAM_STATUS=="PRE-PROGRAM"& sData$CLASS_DESC=="Widget1" & MARKET_NAME!="N/A")
Next, I drop the extra levels left over by subset()
postProgram<-droplevels(postProgram)
preProgram<-droplevels(preProgram)
Now I just wanted to plot the preProgram values as a box-whisker so I aggregate all of the stores in each market as a single weekly total
preAgg<-setNames(aggregate(preProgram$DST_UNITS,by=list(preProgram$MARKET_NAME,preProgram$WEEK_DATE),"sum"),c("MARKET_NAME","WEEK_DATE","DST_UNITS"))
Finally, I go to plot it using ggplot
p<-ggplot(preAgg,aes(factor(MARKET_NAME),DST_UNITS))
p + geom_boxplot(aes(fill=factor(MARKET_NAME))) + geom_jitter(position=position_jitter(width=.2)) + xlab("Market Name") + ylab("Units Sold") +
ggtitle("Pre-Program Weekly Units Sold By Region") + theme(legend.position="none")
And my issue is that for three of the markets I have 8 data points. That's impossible because there are only 7 weeks of data x 5 markets = 35 data points.
What makes me more confused is that R Studio shows only 7 levels for the date information
Just for my own sanity I also checked the NYC / Long Island data with subset() and there are clearly only 7 data points. What is ggplot doing? This only affects 3 of the regions.
> subset(preAgg,preAgg$MARKET_NAME=='NYC / Long Island')
MARKET_NAME WEEK_DATE DST_UNITS
5 NYC / Long Island 2014-03-03 69
10 NYC / Long Island 2014-03-10 88
15 NYC / Long Island 2014-03-17 80
20 NYC / Long Island 2014-03-24 74
25 NYC / Long Island 2014-03-31 64
30 NYC / Long Island 2014-04-07 81
35 NYC / Long Island 2014-04-14 179
Answer by Ben Bolker in comments. Plot was showing outliers in addition to data points.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.