简体   繁体   English

带十进制值的 geom_dotplot

[英]geom_dotplot with decimal values

I have some troubles with the dotplot function applied on my data.我对应用于我的数据的点图 function 有一些问题。 I could reproduced the problem using diamonds data.我可以使用钻石数据重现该问题。 The problem is that some different values are grouped together whereas they shouldn't be.问题是一些不同的值被组合在一起,而它们不应该被组合在一起。 For example, the two yellow dots ("J") are aligned together whereas one is 64 and the other is 63.8.例如,两个黄点(“J”)对齐在一起,而一个是 64,另一个是 63.8。 There is another value 63.8 colored as "E" that is just below.还有另一个值 63.8,颜色为“E”,就在下方。 I would like something more accurate according to the value.我想要根据价值更准确的东西。 It seems that value are rounding and that sometimes these value are separated on several lines.似乎值是四舍五入的,有时这些值在几行上分开。 I don't see the problem on other data (see example 2).我在其他数据上看不到问题(参见示例 2)。

Example 1示例 1

data("diamonds")
dia=diamonds[1:30,]
dia[order(dia$depth, decreasing = TRUE), ]

 carat       cut color clarity depth table price    x    y    z
9   0.22      Fair     E     VS2  65.1    61   337 3.87 3.78 2.49
11  0.30      Good     J     SI1  64.0    55   339 4.25 4.28 2.73
19  0.30      Good     J     SI1  63.8    56   351 4.23 4.26 2.71
22  0.23 Very Good     E     VS2  63.8    55   352 3.85 3.92 2.48
18  0.30      Good     J     SI1  63.4    54   351 4.23 4.29 2.70
5   0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
21  0.30      Good     I     SI2  63.3    56   351 4.26 4.30 2.71
6   0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48
12  0.23     Ideal     J     VS1  62.8    56   340 3.93 3.90 2.46
20  0.30 Very Good     J     SI1  62.7    59   351 4.21 4.27 2.66
27  0.24   Premium     I     VS1  62.5    57   355 3.97 3.94 2.47
4   0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
7   0.24 Very Good     I    VVS1  62.3    57   336 3.95 3.98 2.47
14  0.31     Ideal     J     SI2  62.2    54   344 4.35 4.37 2.71
28  0.30 Very Good     J     VS2  62.2    57   357 4.28 4.30 2.67
17  0.30     Ideal     I     SI2  62.0    54   348 4.31 4.34 2.68
8   0.26 Very Good     H     SI1  61.9    55   337 4.07 4.11 2.53
1   0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
23  0.23 Very Good     H     VS1  61.0    57   353 3.94 3.96 2.41
16  0.32   Premium     E      I1  60.9    58   345 4.38 4.42 2.68
30  0.23 Very Good     F     VS1  60.9    57   357 3.96 3.99 2.42
29  0.23 Very Good     D     VS2  60.5    61   357 3.96 3.97 2.40
13  0.22   Premium     F     SI1  60.4    61   342 3.88 3.84 2.33
26  0.23 Very Good     G    VVS2  60.4    58   354 3.97 4.01 2.41
15  0.20   Premium     E     SI2  60.2    62   345 3.79 3.75 2.27
2   0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
10  0.23 Very Good     H     VS1  59.4    61   338 4.00 4.05 2.39
24  0.31 Very Good     J     SI1  59.4    62   353 4.39 4.43 2.62
25  0.31 Very Good     J     SI1  58.1    62   353 4.44 4.47 2.59
3   0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
> 
ggplot(dia, aes(y=depth, x="")) +
  geom_boxplot() +
  geom_dotplot(aes(fill=factor(color)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE) 

在此处输入图像描述

With other data (see below) that I created to see better what was happening, the problem doesn't exist anymore使用我创建的其他数据(见下文)以更好地了解正在发生的事情,问题不再存在

Example 2示例 2

abb=c(1,1.5,1.5,1.5,2,2,2,2.5,3.5,5,5,5.5,5.5)
bcc=c("Lyon", "Lyon", "Bordeaux", "Bordeaux", "Chambéry", "Lyon", "Lyon", "Nantes", "Nantes", "Lyon", "Lyon", "Rennes", "Lyon")
Fil=data.frame(abb,bcc)

> Fil
   abb      bcc
1  1.0     Lyon
2  1.5     Lyon
3  1.5 Bordeaux
4  1.5 Bordeaux
5  2.0 Chambéry
6  2.0     Lyon
7  2.0     Lyon
8  2.5   Nantes
9  3.5   Nantes
10 5.0     Lyon
11 5.0     Lyon
12 5.5   Rennes
13 5.5     Lyon

with dotplot, I the value 5 and 5.5 are group together.使用点图,我将值 5 和 5.5 组合在一起。 I would like the same but with different "lines" for this values我想要这个值相同但有不同的“线”

ggplot(Fil, aes(y=abb, x="")) +
  geom_boxplot() +
  geom_dotplot(aes(fill=factor(bcc)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE) + 
  scale_fill_manual(values = c("#FF8000", "#FF0033","#80FF00","#FFFF00", "#000000"))

在此处输入图像描述

What are the solution to fix this problem?解决此问题的解决方案是什么?

You just need to change the binwidth .您只需要更改binwidth Before I changed the binwidth I was getting a message from ggplot regarding selecting the binwidth .在我更改binwidth之前,我从 ggplot 收到一条关于选择binwidth的消息。 The message I was getting said我收到的信息

Bin width defaults to 1/30 of the range of the data. Bin 宽度默认为数据范围的 1/30。 Pick better value with binwidth .使用binwidth选择更好的值。

ggplot2 was telling you that binwidth was the issue ggplot2告诉你binwidth是问题所在

ggplot(dia, aes(y=depth, x="")) +
  geom_boxplot() +
  geom_dotplot(aes(fill=factor(color)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE,binwidth = 0.2)

例子

Fixing the bin width is just a bandaid.固定垃圾箱宽度只是一个创可贴。 I think using geom_jitter might be a good alternative我认为使用geom_jitter可能是一个不错的选择

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM