简体   繁体   中英

Spearman rank correlation between factors in R

I have data like the following:

directions <- c("North", "East", "South", "South")
x<-factor(directions, levels= c("North", "East", "South", "West"))

cities <- c("New York","Rome","Paris","London")
y<-factor(cities, levels= c("New York","Rome","Paris","London"))

How can I calculate the Spearman rank correlation between x and y ?

EDIT

As suggested by @user20650 and @dcarlson comments, the variables must have a ranking such that one value is greater or less than another value. This is the case, because North , East etc. are keywords that are sorted according to their presence in a document.

To get Spearman's correlation with factors you will have to convert them to their underlying numeric code:

cor(as.numeric(x), as.numeric(y), method="spearman")
# [1] 0.9486833
cor.test(as.numeric(x), as.numeric(y), method="spearman")
# 
#   Spearman's rank correlation rho
# 
# data:  as.numeric(x) and as.numeric(y)
# S = 0.51317, p-value = 0.05132
# alternative hypothesis: true rho is not equal to 0
# sample estimates:
#       rho 
# 0.9486833 
# 
# Warning message:
# In cor.test.default(as.numeric(x), as.numeric(y), method = "spearman") :
#   Cannot compute exact p-value with ties

Note the warning about ties which make it difficult to compute an exact p-value. You can use spearman_test in package coin for data with ties:

library(coin)
spearman_test(as.numeric(x)~as.numeric(y))
# 
#   Asymptotic Spearman Correlation Test
# 
# data:  as.numeric(x) by as.numeric(y)
# Z = 1.6432, p-value = 0.1003
# alternative hypothesis: true rho is not equal to 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM