简体   繁体   English

如何在R中将函数应用于Spearman秩相关系数?

[英]How to apply a function for Spearman's rank correlation coefficient in R?

I want to write a code for applying the fuction calculating the Spearman's rank correlation between combinations of column from a dataset. 我想编写一个代码,以应用功能计算数据集中列组合之间的Spearman等级相关性。 I have the following dataset: 我有以下数据集:

library(openxlsx)
data <-read.xlsx("e:/LINGUISTICS/mydata.xlsx", 1);

A    B    C    D
go   see  get  eat
see  get  eat  go
get  go   go   get
eat  eat  see  see

The function cor(rank(x), rank(y), method = "spearman") measures correlation only between two columns, eg between A and B: 函数cor(rank(x),rank(y),method =“ spearman”)仅测量两列之间的相关性,例如A和B之间的相关性:

cor(rank(data$A), rank(data$B), method = "spearman")

But I need to calculate correlation between all possible combinations of columns (AB, AC, AD, BC, BD, CD). 但是我需要计算所有可能的列组合(AB,AC,AD,BC,BD,CD)之间的相关性。 I wrote the following function for that: 我为此编写了以下函数:

wert <- function(x, y) { cor(rank(x), rank(y), method = "spearman") }

I do not know how to implement all possible combinations of columns (AB, AC, AD, BC, BD, CD) in my function in order to get all results automatically, because my real data has much more columns, and also as a matrix with correlation scores, eg as the following table: 我不知道如何在我的函数中实现列的所有可能组合(AB,AC,AD,BC,BD,CD)以便自动获取所有结果,因为我的真实数据具有更多的列,并且也作为矩阵具有相关分数,例如,如下表:

    A     B     C     D
A   1     0.3   0.4   0.8
B   0.3   1     0.6   0.5
C   0.4   0.6   1     0.1
D   0.8   0.5   0.1   1

Can somebody help me? 有人可以帮我吗?

You do not need rank . 您不需要rank cor already calculates the Spearman rank correlation with method = "spearman" . cor已经使用method = "spearman"计算了Spearman等级相关性。 If you want the correlation between all columns of a data.frame, just pass the data.frame to cor , ie cor(data, method = "spearman") . 如果要在data.frame的所有列之间建立关联,只需将data.frame传递给cor ,即cor(data, method = "spearman") You should study help("cor") . 您应该学习help("cor")

If you want to do this manually, use the combn function. 如果要手动执行此操作,请使用combn功能。

PS: Your additional challenge is that you actually have factor variables. PS:您面临的另一个挑战是您实际上具有因子变量。 A rank for an unordered factor is a strange concept, but R just uses collation order here. 无序因子的等级是一个奇怪的概念,但是R在这里仅使用排序规则。 Since cor rightly expects numeric input, you should do data[] <- lapply(data, as.integer) first. 由于cor正确地期望数字输入,因此您应该首先执行data[] <- lapply(data, as.integer)

I think you can just make a function (pairedcolumns) that will then apply your function (spearman) to every pair of columns in the data frame you feed it. 我认为您可以制作一个函数(pairedcolumns),然后将您的函数(spearman)应用于您提供数据的数据框中的每一对列。

#This function works on a data frame (x) usingwhichever other function (fun) you select by making all pairs of columns possible.
pairedcolumns <- function(x,fun) 
{
  n <- ncol(x)##find out how many columns are in the data frame

  foo <- matrix(0,n,n)
  for ( i in 1:n)
  {
    for (j in 1:n)
    {
      foo[i,j] <- fun(x[,i],x[,j])
}
}
 colnames(foo)<-rownames(foo)<-colnames(x)
return(foo)
}

 results<-pairedcolumns(yourdataframe[,2:8], function)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM