简体   繁体   English

从R中的for循环输出多个向量

[英]Output multiple vectors from for loop in R

As someone relatively new to R I'm having an issue with creating a for loop. 作为一个相对较新的R,我遇到了创建for循环的问题。

I have a very large data set with 9000 observations and 25 categorical variables, which I've transformed into binary data and preformed hierarchical clustering. 我有一个非常大的数据集,包含9000个观测值和25个分类变量,我已将其转换为二进制数据和预先形成的层次聚类。 Now I want to try K-Modes clustering to produce an Elbow Plot using the "within-cluster simple-matching distance for each cluster", which is outputted from kmodes$withindiff . 现在,我想尝试K-Modes聚类,使用“每个聚类的簇内简单匹配距离”生成一个弯头图,从kmodes$withindiff输出。 I can sum this for each of the k in 1:8 clusters to get the Elbow Plot. 我可以为k in 1:8簇中的每个k in 1:8求和,得到肘图。

library(klaR)
for(k in 1:8)
{
WCSM[k] <- sum(kmodes(data,k,iter.max=100)$withindiff)
}
plot(1:8,WCSM,type="b", xlab="Number of Clusters",ylab="Within-Cluster 
Simple-Matching Distance Summed", main="K-modes Elbow Plot")

My issue is that I want further output from k-modes. 我的问题是我想从k模式进一步输出。 For each k in 1:8 I would like to get the vector of integers indicating the cluster to which each object is allocated to given by kmodes$cluster . 对于k in 1:8每个k in 1:8我想得到整数向量,表示每个对象被分配到的kmodes$clusterkmodes$cluster给出。 I need to create a for loop that loops through each k in 1:8 and saves each of the outputs into 8 separate vectors. 我需要创建一个for循环, k in 1:8遍历每个k in 1:8并将每个输出保存到8个单独的向量中。 But I don't know how to do such a for loop. 但我不知道怎么做这样的for循环。 I could just run the 8 lines of code separately but they each take 15mins to run with iter.max=10 so increasing this to iter.max=100 will need to be left running overnight so a loop would be useful. 我可以分别运行8行代码但是它们每个运行15分钟才能运行iter.max=10因此将此值增加到iter.max=100将需要保持一夜之间运行所以循环将是有用的。

cl.kmodes2=kmodes(data, 2,iter.max=100)
cl.kmodes3=kmodes(data, 3,iter.max=100)
cl.kmodes4=kmodes(data, 4,iter.max=100)
cl.kmodes5=kmodes(data, 5,iter.max=100)
cl.kmodes6=kmodes(data, 6,iter.max=100)
cl.kmodes7=kmodes(data, 7,iter.max=100)
cl.kmodes8=kmodes(data, 8,iter.max=100)

Ultimately I want to compare the results from the hierarchical binary clustering to the k-modes clustering by getting the Adjusted Rand Index. 最后,我想通过获取调整后的兰特指数,将分层二进制聚类的结果与k模式聚类进行比较。 For example, cutting the tree at k=4 for the hierarchical cluster and comparing this to a 4 cluster solution from k-modes: 例如,在层级集群中以k=4切割树,并将其与来自k模式的4集群解决方案进行比较:

dist.binary = dist(data, method="binary")
cl.binary = hclust(dist.binary, method="complete")
hcl.4 = cutree(cl.binary, k = 4)
tab = table(hcl.4, cl.kmodes4$cluster)
library(e1071)
classAgreement(tab)

The best method is to put the output from your clusters into a named list: 最好的方法是将群集的输出放入命名列表:

library(klaR)

myClusterList <- list()

for(k in 1:8) {
  myClusterList[[paste0("k.", i)]] <- kmodes(data, i,iter.max=100)
}

You can then pull out the any of the contents easily: 然后,您可以轻松地提取任何内容:

sum(myClusterList[["k.1"]]$withindiff)

or 要么

sum(myClusterList[[1]]$withindiff)

You can also save the list to use in future R sessions, see ?save . 您还可以保存列表以在将来的R会话中使用,请参阅?save

I agree with Imo, using a list is the best solution. 我同意Imo,使用列表是最好的解决方案。 If you don't want to do that, you could also use assign() to create a new vector in every iteration: 如果您不想这样做,您还可以使用assign()在每次迭代中创建一个新向量:

library(klaR)
for(k in 1:8) {
  assign(paste("cl.kmodes", k, sep = ""), kmodes(data, k, iter.max = 100))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM