[英]Warnings when transforming to logarithmic scale, a lot of NaNs produced
For a few weeks, I have used the following script to produce a scatterplot with approximately 10,000 (non-zero, positive) datapoints.几周以来,我使用以下脚本生成了一个散点图,其中包含大约 10,000 个(非零,正)数据点。 Only few (<20) datapoints were not included because of warnings with the transformation.
由于转换的警告,只有少数 (<20) 个数据点未包括在内。
visual <- ggplot(data=dots, aes(GRNHLin, REDHLin)) +
geom_point(colour=rgb(0.17, 0.44, 0.71), size=0.500, alpha=0.250) +
scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e4)) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e3))
visual
Since this week, I want to do some model-based clustering.从这周开始,我想做一些基于模型的聚类。 The script I wrote (see below) uses the same dataset (10,000 non-zero, positive datapoints) but leaves out more than 9,000 datapoints because of:
我编写的脚本(见下文)使用相同的数据集(10,000 个非零的正数据点),但由于以下原因而遗漏了 9,000 多个数据点:
Warning messages:
1: In self$trans$transform(x) : NaNs produced
2: Transformation introduced infinite values in continuous x-axis
3: In self$trans$transform(x) : NaNs produced
4: Transformation introduced infinite values in continuous y-axis
5: Removed 9692 rows containing missing values (geom_point).
This is the second script:这是第二个脚本:
dots.Mclust <- Mclust(dots, modelNames="VVV", G=8)
visual <- fviz_cluster(dots.Mclust,
ellipse=FALSE,
shape=20,
geom = c("point")) +
scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e3)) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e4))
visual
EDIT编辑
Some additional information:一些附加信息:
The dataset contains only values that are larger than 0. Head(dots.Mclust) provides the following:数据集仅包含大于 0 的值。 Head(dots.Mclust) 提供以下内容:
$data
GRNHLin RED2HLin
[1,] 81.50364 176.379654
[2,] 57.94751 116.310577
[3,] 42.89310 119.758621
[4,] 41.82213 275.607971
[5,] 437.14648 141.309647
[6,] 15.20952 177.128616
[7,] 18.88731 257.249207
[8,] 768.64935 172.374069
[9,] 24.66220 118.283150
[10,] 17.12160 68.955154
[11,] 73.00019 71.517052
[12,] 1182.08911 180.694122
[13,] 320.09827 224.808563
[14,] 268.42401 235.375259
[15,] 149.05655 205.708282
[16,] 98.43160 152.093704
[17,] 25.10120 177.061386
[18,] 293.87103 239.007050
[19,] 118.42249 295.722168
[20,] 724.16718 243.950455
[21,] 255.26083 128.209717
[22,] 105.15983 247.946701
[23,] 86.25691 220.004745
[24,] 122.01743 32.232780
[25,] 50.42104 9.923141
The graph, after removing the scaling on the x-axis and y-axis, looks the following.该图在移除 x 轴和 y 轴上的缩放比例后,如下所示。 Apparently, something goes wrong with the datapoints.
显然,数据点出了问题。 There are no negative values in the dataset, but there are still (a lot of) points below 0. Furthermore, the x-axis and y-axis do not cover the values found in entry [12,].
数据集中没有负值,但仍有(很多)点低于 0。此外,x 轴和 y 轴不涵盖条目 [12,] 中的值。 This is probably the underlying cause of the problem.
这可能是问题的根本原因。 But how does this issue with wrong values occur?
但是这个错误值的问题是如何发生的呢?
What is the underlying issue here?这里的根本问题是什么?
It is indeed correct, as mentioned in the comments, that the sample data are centered and rescaled.正如评论中提到的,样本数据居中并重新缩放确实是正确的。 This option can be turned off via including
可以通过包括关闭此选项
stand=FALSE,
in the options of fviz_cluster.在 fviz_cluster 的选项中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.