简体   繁体   English

转换为对数刻度时的警告,产生了很多 NaN

[英]Warnings when transforming to logarithmic scale, a lot of NaNs produced

For a few weeks, I have used the following script to produce a scatterplot with approximately 10,000 (non-zero, positive) datapoints.几周以来,我使用以下脚本生成了一个散点图,其中包含大约 10,000 个(非零,正)数据点。 Only few (<20) datapoints were not included because of warnings with the transformation.由于转换的警告,只有少数 (<20) 个数据点未包括在内。

visual <- ggplot(data=dots, aes(GRNHLin, REDHLin)) +
    geom_point(colour=rgb(0.17, 0.44, 0.71), size=0.500, alpha=0.250) +
    scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
                  labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e4)) +
    scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                  labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e3))
visual

Since this week, I want to do some model-based clustering.从这周开始,我想做一些基于模型的聚类。 The script I wrote (see below) uses the same dataset (10,000 non-zero, positive datapoints) but leaves out more than 9,000 datapoints because of:我编写的脚本(见下文)使用相同的数据集(10,000 个非零的正数据点),但由于以下原因而遗漏了 9,000 多个数据点:

Warning messages:
1: In self$trans$transform(x) : NaNs produced
2: Transformation introduced infinite values in continuous x-axis 
3: In self$trans$transform(x) : NaNs produced
4: Transformation introduced infinite values in continuous y-axis 
5: Removed 9692 rows containing missing values (geom_point). 

This is the second script:这是第二个脚本:

dots.Mclust <- Mclust(dots, modelNames="VVV", G=8)

visual <- fviz_cluster(dots.Mclust, 
             ellipse=FALSE, 
             shape=20, 
             geom = c("point")) +
  scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e3)) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e4))
visual

EDIT编辑

Some additional information:一些附加信息:

The dataset contains only values that are larger than 0. Head(dots.Mclust) provides the following:数据集仅包含大于 0 的值。 Head(dots.Mclust) 提供以下内容:

$data
           GRNHLin    RED2HLin
   [1,]   81.50364  176.379654
   [2,]   57.94751  116.310577
   [3,]   42.89310  119.758621
   [4,]   41.82213  275.607971
   [5,]  437.14648  141.309647
   [6,]   15.20952  177.128616
   [7,]   18.88731  257.249207
   [8,]  768.64935  172.374069
   [9,]   24.66220  118.283150
  [10,]   17.12160   68.955154
  [11,]   73.00019   71.517052
  [12,] 1182.08911  180.694122
  [13,]  320.09827  224.808563
  [14,]  268.42401  235.375259
  [15,]  149.05655  205.708282
  [16,]   98.43160  152.093704
  [17,]   25.10120  177.061386
  [18,]  293.87103  239.007050
  [19,]  118.42249  295.722168
  [20,]  724.16718  243.950455
  [21,]  255.26083  128.209717
  [22,]  105.15983  247.946701
  [23,]   86.25691  220.004745
  [24,]  122.01743   32.232780
  [25,]   50.42104    9.923141

The graph, after removing the scaling on the x-axis and y-axis, looks the following.该图在移除 x 轴和 y 轴上的缩放比例后,如下所示。 Apparently, something goes wrong with the datapoints.显然,数据点出了问题。 There are no negative values in the dataset, but there are still (a lot of) points below 0. Furthermore, the x-axis and y-axis do not cover the values found in entry [12,].数据集中没有负值,但仍有(很多)点低于 0。此外,x 轴和 y 轴不涵盖条目 [12,] 中的值。 This is probably the underlying cause of the problem.这可能是问题的根本原因。 But how does this issue with wrong values occur?但是这个错误值的问题是如何发生的呢?

绘图后的图形(不缩放 x 轴和 y 轴)。

What is the underlying issue here?这里的根本问题是什么?

It is indeed correct, as mentioned in the comments, that the sample data are centered and rescaled.正如评论中提到的,样本数据居中并重新缩放确实是正确的。 This option can be turned off via including可以通过包括关闭此选项

stand=FALSE,

in the options of fviz_cluster.在 fviz_cluster 的选项中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM