简体   繁体   中英

R ggplot adding extra data points in plot?

I'm new to R, so maybe I'm overlooking something basic here. I'm having an issue where my box/whisker plot with jitter is duplicating certain data points.

First I'm creating two subsets of data as pulled from a database

postProgram<-subset(sData,sData$PROGRAM_STATUS=="POST-PROGRAM" & sData$CLASS_DESC=="Widget1" & MARKET_NAME!="N/A")
preProgram<-subset(sData,sData$PROGRAM_STATUS=="PRE-PROGRAM"& sData$CLASS_DESC=="Widget1" & MARKET_NAME!="N/A")

Next, I drop the extra levels left over by subset()

postProgram<-droplevels(postProgram)
preProgram<-droplevels(preProgram)

Now I just wanted to plot the preProgram values as a box-whisker so I aggregate all of the stores in each market as a single weekly total

preAgg<-setNames(aggregate(preProgram$DST_UNITS,by=list(preProgram$MARKET_NAME,preProgram$WEEK_DATE),"sum"),c("MARKET_NAME","WEEK_DATE","DST_UNITS"))

Finally, I go to plot it using ggplot

p<-ggplot(preAgg,aes(factor(MARKET_NAME),DST_UNITS))
p + geom_boxplot(aes(fill=factor(MARKET_NAME))) + geom_jitter(position=position_jitter(width=.2)) + xlab("Market Name") + ylab("Units Sold") +
  ggtitle("Pre-Program Weekly Units Sold By Region") + theme(legend.position="none")

And my issue is that for three of the markets I have 8 data points. That's impossible because there are only 7 weeks of data x 5 markets = 35 data points.

ggplot

What makes me more confused is that R Studio shows only 7 levels for the date information

R工作室

Just for my own sanity I also checked the NYC / Long Island data with subset() and there are clearly only 7 data points. What is ggplot doing? This only affects 3 of the regions.

> subset(preAgg,preAgg$MARKET_NAME=='NYC / Long Island')
         MARKET_NAME  WEEK_DATE DST_UNITS
5  NYC / Long Island 2014-03-03        69
10 NYC / Long Island 2014-03-10        88
15 NYC / Long Island 2014-03-17        80
20 NYC / Long Island 2014-03-24        74
25 NYC / Long Island 2014-03-31        64
30 NYC / Long Island 2014-04-07        81
35 NYC / Long Island 2014-04-14       179

Answer by Ben Bolker in comments. Plot was showing outliers in addition to data points.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM