简体   繁体   English

有没有一种方法可以使用scikit的无监督方法来学习将某些列表分类为不同的组?

[英]Is there a way using unsupervised method of scikit learn to classify some list into different groups?

I have a number of instances, and each instances has it's own list which represents different steps that it follows. 我有许多实例,每个实例都有自己的列表,这些列表代表其后的不同步骤。 For example : 例如 :

1284 -> [0, 100, 200, 100, 200, 300, 600]
1285 -> [0, 100, 200, 100, 200, 300, 500, 999]
1286 -> [0, 100, 200, 300, 600]
...
13023 -> [0, 100, 170, 100, 200]

And for example, the instance 1284 go through the steps 0 to 600 like that 例如,实例1284像这样经历步骤0到600

0 -> 100
100 -> 200
200 -> 100
100 -> 200
200 -> 300
300 -> 100

I have managed to get the list of the path of each instance but I want to find instances with loops and classify them. 我设法获取了每个实例的路径列表,但是我想找到带有循环的实例并对它们进行分类。 For example the instance 1284 go through the steps 100 and 200 two times. 例如,实例1284两次经历步骤100和200。

I would like to know how to do that. 我想知道该怎么做。 I thought of unsupervised classification with scikit learn, but I'm not familiar with it and I don't know how to classify those lists. 我想到了使用scikit learning进行无监督分类,但是我对此并不熟悉,也不知道如何对这些列表进行分类。

Some help would be really appreciated. 一些帮助将不胜感激。 Thx! 谢谢!

I think you can use the following trick to do this without any machine learning 我认为您可以使用以下技巧来完成此操作,而无需任何机器学习

  1. Change the list of step into a set 将步骤列表更改为一组
  2. Now compare the size of the set to size of the original steps 现在将集合的大小与原始步骤的大小进行比较
  3. If the size is same then there were all distinct stepse 如果大小相同,则所有步骤都不同
  4. Else there was a loop 否则有一个循环

I based this algorithm on the assumption that if there are no loops then all steps will be distinct. 我基于此算法的假设是,如果没有循环,则所有步骤都是不同的。

list_1284 = [0, 100, 200, 100, 200, 300, 600]

set_1284 = set(list_1284)

if len(set_1284) != len(list_1284):
   print "There exists a loop"

else:
   print "No loop exists"

I think you can use unsupervised machine learning algorithm like clustering, which will classify your similar instance into one group called cluster. 我认为您可以使用诸如群集之类的无监督机器学习算法,该算法会将您的相似实例分为一组,称为群集。

In scikit clustering algorithm are available and you can go through the link mention below 在scikit中可以使用聚类算法,您可以浏览下面的链接

http://scikit-learn.org/stable/modules/clustering.html#clustering http://scikit-learn.org/stable/modules/clustering.html#clustering

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM