I have this set of sequences with 2 variables for a 3rd variable(device). Now i want to break the sequence for each device into sets of 300. dsl
is a data frame that contains d
being the device id and s
being the number of sequences of length 300.
First, I am labelling (column Sid
) all the sequences rep(1,300)
followed by rep(2,300)
and so on till rep(s,300)
. Whatever remains unlabelled ie with initialized labels(=0) needs to be ignored. The actual labelling happens with seqid
vector though.
I had to do this as I want to stack the sets of 300 data points and then transpose it. This would form one row of my predata data.frame. For each predata data frame i am doing a k-means to generate 5 clusters that I am storing in final data.
Essentially for every device I will have 5 clusters that I can then pull by referencing the row number in final data (mapped to device id).
#subset processed data by device
for (ds in 1:387){
d <- dsl[ds,1]
s <- dsl[ds,3]
temp.data <- subset(data,data$Device==d)
temp.data$Sid <- 0
temp.data[1:(s*300),4] <- rep(1:300,s)
temp.data <- subset(temp.data,temp.data$Sid!="0")
seqid <- NA
for (j in 1:s){ seqid[(300*(j-1)+1):(300*j)] <- j }
temp.data$Sid <- seqid
predata <- as.data.frame(matrix(numeric(0),s,600))
for(k in 1:s){
temp.data2 <- subset(temp.data[,c(1,2)], temp.data$Sid==k)
predata[k,] <- t(stack(temp.data2)[,1])
}
ob <- kmeans(predata,5,iter.max=10,algorithm="Hartigan-Wong")
finaldata <- rbind(finaldata,(unique(fitted(ob,method="centers"))))
}
Being a noob to R, I ended up with 3 nested loops (the function did work for the outermost loop being one value). This has taken 5h and running. Need a faster way to go about this.
Any help will be appreciated.
Thanks
Ok, I am going to suggest a radical simplification of your code within the loop. However, it is hard to verify that I in fact did assume the right thing without having sample data. So please ensure that my predata
in fact equals yours.
First the code:
for (ds in 1:387){
d <- dsl[ds,1]
s <- dsl[ds,3]
temp.data <- subset(data,data$Device==d)
temp.data <- temp.data[1:(s*300),]
predata <- cbind(matrix(temp.data[,1], byrow=T, ncol=300), matrix(temp.data[,2], byrow=T, ncol=300))
ob <- kmeans(predata,5,iter.max=10,algorithm="Hartigan-Wong")
finaldata <- rbind(finaldata,(unique(fitted(ob,method="centers"))))
}
What I understand you are doing: Take the first 300*s
elements from your subset(data, data$Devide == d
. This might easily be done using the command
temp.data <- temp.data[1:(s*300),]
Afterwards, you collect a matrix that has the first row c(temp.data[1:300, 1], temp.data[1:300, 2])
, and so on for all further rows. I do this using the matrix
command as above.
I assume that your outer loop could be transformed in a call to tapply
or something similar, but therefore, we would need more context.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.