简体   繁体   English

r中矩阵格式数据的相关性

[英]correlation for data in matrix format in r

I have created a matrix in R and I want to investigate the correlation between two columns.我在 R 中创建了一个矩阵,我想研究两列之间的相关性。 My_matrix is: My_matrix 是:

         speed motor rpm acceleration age
cadillac     3        42           67  22
porche       5        40           68  21
ferrari      7        37           69  20
peugeot     10        32           70  19
kia         12        28           71  18

when I try the cor(speed~age, data=My_matrix) I get the following error:当我尝试cor(speed~age, data=My_matrix) ,出现以下错误:

Error in cor(speed ~ age, data = a) : unused argument (data = My_matrix) cor(speed ~ age, data = a) 中的错误:未使用的参数(数据 = My_matrix)

any idea how I can address this?知道如何解决这个问题吗? Thanks.谢谢。

We can subset the columns and apply the cor directly as the usage of cor is我们可以子集的列和应用cor直接作为的使用cor

cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman")) cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))

and there is no formula method并且没有公式方法

cor(My_matrix[,c("speed", "age")])
#          speed        age
#speed  1.0000000 -0.9971765
#age   -0.9971765  1.0000000

I also tried this and it worked: I created a "b" dataset我也试过了,它奏效了:我创建了一个“b”数据集

b=as.data.frame(My_matrix) b=as.data.frame(My_matrix)

then I used the然后我使用了

cor(b$speed, b$age) and got the correlation. cor(b$speed, b$age) 并得到相关性。

There are some great base R solutions on here already (hats off to @akrun & @Debutant, base R is great!).这里已经有一些很棒的基础 R 解决方案(向@akrun 和 @Debutant 致敬,基础 R 很棒!)。 I would like to add alternate solutions for future viewers and code preference options.我想为未来的查看者和代码偏好选项添加替代解决方案。

If you don't like typing quote marks and the dataset is small enough, column numbers can be faster--although variable names in quotations is better for accuracy (especially if the columns are reordered).如果您不喜欢键入引号并且数据集足够小,列号可以更快——尽管引号中的变量名更准确(尤其是在列重新排序的情况下)。

@mikey in the comments offered a column number solution, here is an alternate version:评论中的@mikey 提供了一个列号解决方案,这是一个替代版本:

cor(My_matrix[,c(1,4)])

If your data is a dataframe instead of a matrix, you might enjoy a tidyverse approach, which also does not require quotation marks (although pesky variables with spaces in their names may require ` marks):如果您的数据是数据框而不是矩阵,您可能会喜欢一种 tidyverse 方法,它也不需要引号(尽管名称中带有空格的讨厌的变量可能需要 ` 标记):

library(dplyr)
My_dataframe %>% select(speed, age) %>% cor()

@Debutant only asked for 2 variables for the correlation but if we wanted to go all out and get the full correlation matrix available, here are additional options: @Debutant 只要求 2 个变量的相关性,但如果我们想全力以赴并获得完整的相关矩阵,这里有其他选项:

# assuming all your columns are numeric as they are here
cor(My_matrix)
# if you have a dataframe with different data types, select only the numeric ones
library(dplyr)
My_dataframe %>% select_if(is.numeric) %>% cor()
# if you don't like the long decimals, toss in a round() for good measure
My_dataframe %>% select_if(is.numeric) %>% cor() %>% round(3)

Hope you find this useful.希望您觉得这个有帮助。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM