I have defined a distance function as follow
jaccard.rules.dist <- function(x,y) ({
# implements feature distance. Feature "Airline" gets a different treatment, the rest
# are booleans coded as 1/0. Airline column distance = 0 if same airline, 1 otherwise
# the rest of the atributes' distance is cero iff both are 1, 1 otherwise
airline.column <- which(colnames(x)=="Aerolinea")
xmod <- x
ymod <-y
xmod[airline.column] <-ifelse(x[airline.column]==y[airline.column],1,0)
ymod[airline.column] <-1 # if they are the same, they are both ones, else they are different
andval <- sum(xmod&ymod)
orval <- sum(xmod|ymod)
return (1-andval/orval)
})
which modifies a little bit jaccard distance for dataframes of the form
t <- data.frame(Aerolinea=c("A","B","C","A"),atr2=c(1,1,0,0),atr3=c(0,0,0,1))
Now, I would like to perform some k-means clustering on my dataset, using the distance just defined. If I try to use the function kmeans, there is no way to specify my distance function. I tried the to use hclust, which accepts a distanca matrix, which I calculated as follows
distmat <- matrix(nrow=nrow(t),ncol=nrow(t))
for (i in 1:nrow(t))
for (j in i:nrow(t))
distmat[j,i] <- jaccard.rules.dist(t[j,],t[i,])
distmat <- as.dist(distmat)
and then invoked hclust
hclust(distmat)
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
missing value where TRUE/FALSE needed
what am i doing wrong? is there another way to do clustering that just accepts an arbitrary distance function as its input?
thanks in advance.
I think distmat
(from your code) has to be a distance structure (which is different from a matrix). Try this instead:
require(proxy)
d <- dist(t, jaccard.rules.dist)
clust <- hclust(d=d)
clust@centers
[,1] [,2]
[1,] 0.044128322 -0.039518142
[2,] -0.986798495 0.975132418
[3,] -0.006441892 0.001099211
[4,] 1.487829642 1.000431146
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.