简体   繁体   中英

K-means algorithm, R

everyone! I've been asked to create an K-means algorithm on R, but I don't really know the language, so I've found some example code on the internet, and decided to use. I've looked into it, learned the functions that are being used in it, and corrected it a bit, because it didn't work very well. Here's the code:

# Creating a sample of data
y=rnorm(500,1.65)
x=rnorm(500,1.15)
x=cbind(x,y)
centers <- x[sample(nrow(x),5),]

# A function for calculating the distance between centers and the rest of the dots
euclid <- function(points1, points2) {
  distanceMatrix <- matrix(NA, nrow=dim(points1)[1], ncol=dim(points2)[1])
  for(i in 1:nrow(points2)) {
    distanceMatrix[,i] <- sqrt(rowSums(t(t(points1)-points2[i,])^2))
  }
  distanceMatrix
}


# A method function
K_means <- function(x, centers, euclid, nItter) {
  clusterHistory <- vector(nItter, mode="list")
  centerHistory <- vector(nItter, mode="list")

  for(i in 1:nItter) {
    distsToCenters <- euclid(x, centers)
    clusters <- apply(distsToCenters, 1, which.min)
    centers <- apply(x, 2, tapply, clusters, mean)
    # Saving history
    clusterHistory[[i]] <- clusters
    centerHistory[[i]] <- centers
  }

  structure(list(clusters = clusterHistory, centers = centerHistory))

}


res <- K_means(x, centers, euclid, 5)
#To use the same plot operations I had to use unlist, since the resulting object in my function is a list of lists,
#and default object is just a list. And also i store the history of each iteration in that object.
res <- unlist(res, recursive = FALSE)
plot(x, col = res$clusters5)
points(res$centers5, col = 1:5, pch = 8, cex = 2)

It works fine on this simple matrix. But I've been asked to use it on iris:

head(iris)
a <-data.frame(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
centers <- a[sample(nrow(a),3),]
iris_clusters <- K_means(a, centers, euclid, 3)
iris_clusters <- unlist(iris_clusters, recursive = FALSE)
head(iris_clusters)

And the problem is that it doesn't work. The error is:

Error in distanceMatrix[, i] <- sqrt(rowSums(t(t(points1) - points2[i,  : 
  number of items to replace is not a multiple of replacement length 

I understand that dimensions of objects don't match, but I don't understand why. That's why i'm asking for help. I apologize for all the stupidity there may be in this code in advance, but I'm not really familiar with the language yet, so don't judge me too harsh. Thank you!

Your implementation should work with simple typecasts

iris_clusters <- K_means(as.matrix(a), as.matrix(centers), euclid, 3) # 3 iterations

iris_clusters <- unlist(iris_clusters, recursive = FALSE)

# plotting the clusters obtained on the first two dimensions at the end of 3rd iteration

plot(a[,1:2], col = iris_clusters$clusters3, pch=19) 
points(iris_clusters$centers3, col = 1:5, pch = 8, cex = 2)

在此处输入图片说明

head(iris_clusters)

# cluster assignments and centroids computed at different iterations

$clusters1
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 3 2 3 2 3 2 3 3 3 3 2 3 3 3 3 3 3 2 3 2 2 3 3
 [77] 2 2 3 3 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 1 2 1 2 1 1 3 1 1 1 2 2 2 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 2 2 2 1 2 2 1 1 2 2 2 2 2

$clusters2
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 3 3 2 2 2 3 2 2 2 2 3 2 2 2 2 2 2
 [77] 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 2

$clusters3
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [77] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 1 2 2 2 2

$centers1
  iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width
1          7.150000         3.120000          6.090000        2.1350000
2          6.315909         2.915909          5.059091        1.8000000
3          5.297674         3.115116          2.550000        0.6744186

$centers2
  iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width
1          7.122727         3.113636          6.031818        2.1318182
2          6.123529         2.852941          4.741176        1.6132353
3          5.056667         3.268333          1.810000        0.3883333

$centers3
  iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width
1          7.014815         3.096296          5.918519         2.155556
2          6.025714         2.805714          4.588571         1.518571
3          5.005660         3.369811          1.560377         0.290566

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM