如何編輯自己的 k-means 函數，以便將簇作為輸入而不是 R 中的中心？

Question

如何編輯此函數以將“k”（聚類數）作為輸入而不是當前情況下的中心？ 代碼如下：

# Calculates Eudlidean distance
euclid <- function(points1, points2) {
  distanceMatrix <- matrix(NA, nrow=dim(points1)[1], ncol=dim(points2)[1])
  for(i in 1:nrow(points2)) {
    distanceMatrix[,i] <- sqrt(rowSums(t(t(points1)-points2[i,])^2))
  }
  distanceMatrix
}

# k-means algorithm
k_means = function(x, centers, distFun) {
  prevClusters = NULL
  prevCenters = NULL
  
  repeat {
    distsToCenters = distFun(x, centers)
    clusters = apply(distsToCenters, 1L, which.min)
    centers = apply(x, 2L, tapply, clusters, mean) # If I replace 'mean' here with 'centroid', error comes
    if (identical(prevClusters, clusters)) break
    
    prevClusters = clusters
    prevCenters = centers
  }
  
  list(clusters = clusters, centers = centers)
}

test=data # A data.frame
ktest=as.matrix(test) # Turn into a matrix
centers <- ktest[sample(nrow(ktest), 5),] # Sample some centers, 5 for example

res <- k_means(ktest, centers, euclid) 
print(res)

使用數據矩陣作為輸入時的結果是多個集群，后跟它們的中心。 是否可以對其進行編輯，以便輸入所需的聚類數而不是所需的中心數？ 即如何定義“集群”以便它可以用作輸入？

Answer 1

首先，我建議您不要重新發明輪子，因為 R 提供了開箱即用的kmeans實現。 但是，如果在您的函數中只為您提供了集群的數量，您可以在數據范圍內隨機選擇點。 就像是：

if (length(centers)==1) {
    k<-as.integer(centers)
    extrema<-apply(x,2,range)
    centers<-apply(extrema,2,function(.x) runif(k,.x[1],.x[2])) 
}

rigth 在函數的開頭。

如何編輯自己的 k-means 函數，以便將簇作為輸入而不是 R 中的中心？

問題描述

1 個解決方案

解決方案1
1 2020-11-22 19:08:57

如何編輯自己的 k-means 函數，以便將簇作為輸入而不是 R 中的中心？

問題描述

1 個解決方案

解決方案1 1 2020-11-22 19:08:57

解決方案1
1 2020-11-22 19:08:57