简体   繁体   中英

Avoiding nested loops in R

I have this set of sequences with 2 variables for a 3rd variable(device). Now i want to break the sequence for each device into sets of 300. dsl is a data frame that contains d being the device id and s being the number of sequences of length 300.

First, I am labelling (column Sid ) all the sequences rep(1,300) followed by rep(2,300) and so on till rep(s,300) . Whatever remains unlabelled ie with initialized labels(=0) needs to be ignored. The actual labelling happens with seqid vector though.

I had to do this as I want to stack the sets of 300 data points and then transpose it. This would form one row of my predata data.frame. For each predata data frame i am doing a k-means to generate 5 clusters that I am storing in final data.

Essentially for every device I will have 5 clusters that I can then pull by referencing the row number in final data (mapped to device id).

#subset processed data by device

for (ds in 1:387){
  d <- dsl[ds,1]
  s <- dsl[ds,3]

  temp.data <- subset(data,data$Device==d)
  temp.data$Sid <- 0
  temp.data[1:(s*300),4] <- rep(1:300,s)
  temp.data <- subset(temp.data,temp.data$Sid!="0")

  seqid <- NA

  for (j in 1:s){ seqid[(300*(j-1)+1):(300*j)] <- j }

  temp.data$Sid <- seqid

  predata <- as.data.frame(matrix(numeric(0),s,600))


  for(k in 1:s){
    temp.data2 <- subset(temp.data[,c(1,2)], temp.data$Sid==k)
    predata[k,] <- t(stack(temp.data2)[,1])
  }

  ob <- kmeans(predata,5,iter.max=10,algorithm="Hartigan-Wong")
  finaldata <- rbind(finaldata,(unique(fitted(ob,method="centers"))))
}

Being a noob to R, I ended up with 3 nested loops (the function did work for the outermost loop being one value). This has taken 5h and running. Need a faster way to go about this.

Any help will be appreciated.

Thanks

Ok, I am going to suggest a radical simplification of your code within the loop. However, it is hard to verify that I in fact did assume the right thing without having sample data. So please ensure that my predata in fact equals yours.

First the code:

for (ds in 1:387){
  d <- dsl[ds,1]
  s <- dsl[ds,3]

  temp.data <- subset(data,data$Device==d)
  temp.data <- temp.data[1:(s*300),]

  predata <- cbind(matrix(temp.data[,1], byrow=T, ncol=300), matrix(temp.data[,2], byrow=T, ncol=300))

  ob <- kmeans(predata,5,iter.max=10,algorithm="Hartigan-Wong")
  finaldata <- rbind(finaldata,(unique(fitted(ob,method="centers"))))
}

What I understand you are doing: Take the first 300*s elements from your subset(data, data$Devide == d . This might easily be done using the command

temp.data <- temp.data[1:(s*300),]

Afterwards, you collect a matrix that has the first row c(temp.data[1:300, 1], temp.data[1:300, 2]) , and so on for all further rows. I do this using the matrix command as above.

I assume that your outer loop could be transformed in a call to tapply or something similar, but therefore, we would need more context.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM