简体   繁体   English

R:cor()和corrplot()中的错误

[英]R: errors in cor() and corrplot()

Another stumbling block. 另一个绊脚石。 I have a large set of data (called "brightly") with about ~180k rows and 165 columns. 我有大量的数据集(称为“明亮”),大约有18万行和165列。 I am trying to create a correlation matrix of these columns in R. 我正在尝试在R中创建这些列的相关矩阵。

Several problems have arisen, none of which I can resolve with the suggestions proposed on this site and others. 出现了几个问题,我无法用该站点和其他站点上提出的建议来解决这些问题。

First, how I created the data set: I saved it as a CSV file from Excel. 首先,如何创建数据集:我将其从Excel保存为CSV文件。 My understanding is that CSV should remove any formatting, such that anything that is a number should be read as a number by R. I loaded it with 我的理解是CSV应该删除所有格式,以便R可以将数字中的任何内容读取为数字。我用

brightly = read.csv("brightly.csv", header=TRUE) brightly = read.csv(“ brightly.csv”,标头= TRUE)

But I kept getting "'x' must be numeric" error messages every time I ran cor(brightly), so I replaced all the NAs with 0s. 但是每次运行cor(明亮地)时,我总是收到“ x必须是数字”错误消息,因此我将所有NA替换为0。 (This may be altering my data, but I think it will be all right--anything that's "NA" is effectively 0, either for the continuous or dummy variables.) (这可能会改变我的数据,但是我认为这没问题-无论是连续变量还是虚拟变量,任何“ NA”实际上都为0。)

Now I am no longer getting the error message about text. 现在,我不再收到有关文本的错误消息。 But any time I run cor()--either on all of the variables simultaneously or combinations of the variables--I get "Warning message: In cor(brightly$PPV, brightly, use = "complete") : the standard deviation is zero" 但是任何时候我同时对所有变量或变量组合运行cor()时,都会收到“警告消息:在cor(brightly $ PPV,bright,use =“ complete”)中:标准偏差为零”

I am also having some of the correlations of that one variable with others show up as "NA." 我还发现该变量与其他变量的某些相关性显示为“ NA”。 I have ensured that no cell in the data is "NA," so I do not know why I am getting "NA" values for the correlations. 我确保数据中没有任何单元格为“ NA”,所以我不知道为什么要获取相关性的“ NA”值。

I also tried both of the following to make REALLY sure I wasn't including any NA values: 我还尝试了以下两种方法,以确保没有包含任何NA值:

cor(brightly$PPV, brightly, use = "pairwise.complete.obs") cor(明亮地$ PPV,明亮地,使用=“ pairwise.complete.obs”)

and

cor(brightly$PPV,brightly,use="complete") cor(brightly $ PPV,brightly,use =“ complete”)

But I still get warnings about the SD being zero, and I still get the NAs. 但是我仍然收到关于SD为零的警告,并且仍然得到NA。

Any insights as to why this might be happening? 关于为什么会发生这种情况的任何见解?

Finally, when I try to do corrplot to show the results of the correlations, I do the following: 最后,当我尝试执行corrplot以显示相关结果时,请执行以下操作:

brightly2 <- cor(brightly) Warning message: In cor(brightly) : the standard deviation is zero corrplot(brightly2, method = "number") Error in if (min(corr) < -1 - .Machine$double.eps || max(corr) > 1 + .Machine$double.eps) { : missing value where TRUE/FALSE needed brightly2 <-cor(brightly)警告消息:在cor(brightly)中:标准偏差为零corrplot(brightly2,method =“ number”)if(min(corr)<-1-.Machine $ double.eps | | max(corr)> 1 + .Machine $ double.eps){:缺少值,需要TRUE / FALSE

And instead of making my nice color-coded correlation matrix, I get this. 我没有得到我漂亮的颜色编码的相关矩阵,而是得到了这个。 I have yet to find an explanation of what that means. 我还没有找到关于这意味着什么的解释。

Any help would be HUGELY appreciated! 任何帮助将不胜感激! Thanks very much!! 非常感谢!!

Please check if you replaced your NAs with 0 or '0' as one is character and other is int. 请检查是否用0或'0'替换了NA,因为一个是字符,另一个是int。 Or you can even try using as.numeric(column_name) function to convert your char 0s with int 0. Also this error occurs if your dataset has factors, because those are not int values corrplot throws this error. 或者甚至可以尝试使用as.numeric(column_name)函数将char 0转换为as.numeric(column_name)如果您的数据集包含因子,也会发生此错误,因为这些不是int值,corrplot会引发此错误。 It would be helpful of you put sample of your data in the question using 使用以下命令将数据样本放入问题中会有所帮助

str(head(your_dataset))

That would be helpful for you to check the datatypes of columns. 这将有助于您检查列的数据类型。 Let me know if I am wrong. 让我知道我是否错。 Cheerio. 啦啦啦

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM