简体   繁体   English

R ggplot在绘图中添加额外的数据点吗?

[英]R ggplot adding extra data points in plot?

I'm new to R, so maybe I'm overlooking something basic here. 我是R的新手,所以也许我在这里忽略了一些基本的知识。 I'm having an issue where my box/whisker plot with jitter is duplicating certain data points. 我遇到一个问题,我的带有抖动的箱形晶须图正在复制某些数据点。

First I'm creating two subsets of data as pulled from a database 首先,我要创建两个从数据库中提取的数据子集

postProgram<-subset(sData,sData$PROGRAM_STATUS=="POST-PROGRAM" & sData$CLASS_DESC=="Widget1" & MARKET_NAME!="N/A")
preProgram<-subset(sData,sData$PROGRAM_STATUS=="PRE-PROGRAM"& sData$CLASS_DESC=="Widget1" & MARKET_NAME!="N/A")

Next, I drop the extra levels left over by subset() 接下来,我删除由subset()留下的额外级别

postProgram<-droplevels(postProgram)
preProgram<-droplevels(preProgram)

Now I just wanted to plot the preProgram values as a box-whisker so I aggregate all of the stores in each market as a single weekly total 现在,我只想将preProgram的值绘制成盒状图,所以我将每个市场中的所有商店汇总为每周一次

preAgg<-setNames(aggregate(preProgram$DST_UNITS,by=list(preProgram$MARKET_NAME,preProgram$WEEK_DATE),"sum"),c("MARKET_NAME","WEEK_DATE","DST_UNITS"))

Finally, I go to plot it using ggplot 最后,我去用ggplot绘制它

p<-ggplot(preAgg,aes(factor(MARKET_NAME),DST_UNITS))
p + geom_boxplot(aes(fill=factor(MARKET_NAME))) + geom_jitter(position=position_jitter(width=.2)) + xlab("Market Name") + ylab("Units Sold") +
  ggtitle("Pre-Program Weekly Units Sold By Region") + theme(legend.position="none")

And my issue is that for three of the markets I have 8 data points. 我的问题是,对于三个市场,我有8个数据点。 That's impossible because there are only 7 weeks of data x 5 markets = 35 data points. 这是不可能的,因为只有7周的数据x 5个市场= 35个数据点。

ggplot

What makes me more confused is that R Studio shows only 7 levels for the date information 更让我困惑的是R Studio仅显示7个级别的日期信息

R工作室

Just for my own sanity I also checked the NYC / Long Island data with subset() and there are clearly only 7 data points. 出于我自己的理智,我还使用subset()检查了纽约/长岛的数据,显然只有7个数据点。 What is ggplot doing? ggplot在做什么? This only affects 3 of the regions. 这仅影响3个区域。

> subset(preAgg,preAgg$MARKET_NAME=='NYC / Long Island')
         MARKET_NAME  WEEK_DATE DST_UNITS
5  NYC / Long Island 2014-03-03        69
10 NYC / Long Island 2014-03-10        88
15 NYC / Long Island 2014-03-17        80
20 NYC / Long Island 2014-03-24        74
25 NYC / Long Island 2014-03-31        64
30 NYC / Long Island 2014-04-07        81
35 NYC / Long Island 2014-04-14       179

Answer by Ben Bolker in comments. Ben Bolker在评论中回答。 Plot was showing outliers in addition to data points. 除数据点外,该图还显示了异常值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM