简体   繁体   中英

Calculate nearest neighbor distance between data points within a previously identified Kmeans cluster in R

I would like to use nndist.ppx() to calculate distance to nearest neighbor within a given Kmeans cluster (df$cluster is as.factor). The clusters are first identified using kmeans(df,2), and I then cbind the cluster vector to original df, and then convert it to class ppx using ppx(df,simplify=F) because df is 3D (xyz) and nndist() requires class ppx.

The problem is that I can only get nndist.ppx to calculate the distance to all the points in the df irregardless of cluster. This question is close to what I'm looking for in that distance is being calculated with a restraint.

Start with practice data which is a list with 2 elements of class df

library(spatstat)
library(stats)

df_a1 <- data.frame(X = c(9,9,10,10,17,20,22,25,40,40,42), 
Y=c(10,10,11,11,105,106,108,109,112,113,114), Z=c(1,1,1,1,3,4,4,6,8,8,8))

df_a2 <- data.frame(X = c(9,9,10,10,15,22,26,30,40,40,42), 
Y=c(10,10,11,11,105,106,108,109,112,113,114), Z=c(1,1,1,1,5,5,4,5,7,7,8))

list_a <- list(df_a1,df_a2)
df_a_list_names<-c("control", "variable")

Run kmeans clustering: Here is my Kmeans fxn which also cbinds the Kmeans cluster vector to the original df. I then lapply the kmeans_fxn over list of dfs. The output is stored in a new list.

kmeans_fxn<-function(x){
kmeans(x,(3))->results
results$cluster->cluster
cluster->x$cluster
as.factor(x$cluster)->x$cluster
return(x)
}

lapply(list_a, kmeans_fxn)->kmean_results_list

Calculate distance of nearest neighbor:
Here is fxn I wrote to calculate distance between each data point and its top 2 nearest neighbors. I then lapply the fxn to previously created list

distance_fxn<-function(x){
x<-ppx(x, simplify=F)->df.ppx
nndist.ppx(df.ppx,k=2)->x
as.data.frame(x)->x
return(x)
}

lapply(kmean_results_list, distance_fxn)->nearest_list

The output is distance to nearest neighbor within entire df irregardless of cluster (I repeated without cluster column and output was the same...not shown).

Also, I tried this

kmeans_results_list[[1]]->fob
ppx(fob, simplify=F)->fob.ppx
by(fob.ppx[[1]], cluster, function(x) nndist.ppx(fob.ppx, k=2))   

and this but neither worked

by(fob.ppx, fob.ppx[[1]], function(x) nndist.ppx(fob.ppx, k=2))

Instead of treating the cluster label as a coordinate, treat it as a mark. Use as.ppp to convert your data frame to a two-dimensional point pattern (class ppp) with categorical marks. Then divide this pattern X into a list of patterns using Y <-split(X). Then compute nearest neighbour distances within each cluster by D <- lapply(Y, nndist). If you want the distances in their original order use unsplit(D, marks(X)).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM