Error looping through list: “Error in `[<-.data.frame`(`tmp`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes… ”

Question

I'm trying to write a function that loops through a list in order to run kmeans clustering on only specific columns of a dataset. I want the output to be a matrix/dataframe of the cluster membership of each observation when kmeans is run on each set of columns.

Here's a mock dataset and the function I came up with (I'm new to R--sorry if it's shaky)

set.seed(123)
mydata <- data.frame(a = rnorm(100,0,1), b = rnorm(100,0,1), c = 
rnorm(100,0,1), d = rnorm(100,0,1), e = rnorm(100,0,1)) 

set.seed(123)
my.kmeans <- function(data,k,...) {
    clusters <- data.frame(matrix(nrow = nrow(data), ncol = 
    length(list(...)))) # set up dataframe for clusters
    for(i in list(...)) {
        kmeans <- kmeans(data[,i],centers = k)
        clusters[,i] <- kmeans$cluster
    }
    colnames(clusters) <- list(...)
    clusters
}

My question is: this seems to work when I only ask it to use consecutive columns, but not when I ask it to skip around some. For instance, the first of the following works, but the second does not. Any idea how I can fix this?

# works how I want 
head(my.kmeans(data = mydata, k = 8, c(1,2), c(2,3), c(1,2,3)))

# doesn't work 
head(my.kmeans(data = mydata, k = 8, c(1,2), c(2,3), c(1,2,5)))

Also, I know people recommend using apply functions and staying away from for loops, but I don't know how to do this with an apply function. Any advice on that would be much appreciated as well.

Thanks so much!

Danny

Answer 1

Building on @SatZ's comments,

set.seed(123)
mydata <- data.frame(a = rnorm(100,0,1), b = rnorm(100,0,1), c = 
                   rnorm(100,0,1), d = rnorm(100,0,1), e = 
                   rnorm(100,0,1)) 
mylist <- list(c(1,2), c(2,3), c(1,2,5))

set.seed(123)
my.kmeans <- function(data,k,list) {
  clusters <- data.frame(matrix(nrow = nrow(data), ncol = 
                              length(list))) # set up dataframe for 
                              clusters
  for(i in 1:length(list)) {
      kmeans <- kmeans(data[,list[[i]]],centers = k)
      clusters[,i] <- kmeans$cluster
  }
  colnames(clusters) <- list
  clusters
}

head(my.kmeans(data = mydata, k = 8, list = mylist))

Error looping through list: “Error in `[<-.data.frame`(`tmp`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes… ”

Question

1 answers

solution1
1 2018-07-11 19:31:31

Error looping through list: “Error in `[<-.data.frame`(`*tmp*`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes… ”

Question

1 answers

solution1 1 2018-07-11 19:31:31

Error looping through list: “Error in `[<-.data.frame`(`tmp`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes… ”

solution1
1 2018-07-11 19:31:31