简体   繁体   English

如何在python中使用机器学习对不同的字符串进行聚类

[英]How to cluster different strings using machine learning in python

我有一个由建筑物名称组成的数据集。例如 {Hill View,Hills View,Hill Apartment...}。我想使用机器学习对这些字符串进行聚类。例如,在聚类后,一个聚类应该包含相似或有些相似的字符串{Hills,Hill...}。我尝试了各种 scikit 算法,如 K-means、Affinity Propagation 等,但没有成功。请帮助。

Machine Learning isn't magic!机器学习不是魔术! It uses mathematical objects and functions.它使用数学对象和函数。

You need first steps - usually known as Data Mining - which kind of consists in:您需要第一步——通常称为数据挖掘——包括:

  • Transforming any input (string, pictures, videos, anything...) to numbers (vectors, matrices or any relevent structure).将任何输入(字符串、图片、视频、任何东西...)转换为数字(向量、矩阵或任何相关结构)。

  • Defining distance and similarity between vectors (= distance between the numerical representation of your input ~= distance between string, pictures, videos, anything).定义向量之间的距离和相似性(=​​ 输入的数字表示之间的距离 ~= 字符串、图片、视频等之间的距离)。

This is not trivial and can be done different ways depending on your data/objectives.这不是微不足道的,可以根据您的数据/目标以不同的方式完成。

Since I don't know your background in CS/ML/Maths, I could just give you a general approach which is, in a general case, quite good/easy.由于我不知道您在 CS/ML/数学方面的背景,我只能给您一个通用的方法,在一般情况下,它非常好/容易。

That is the general speach, in pratice this problematic is complex and there's a lot to learn on that.这是一般的演讲,实际上这个问题很复杂,有很多东西要学习。 You will most probably need the edit distance which is the most intuitive distance between words, you should also consider stemming which.您很可能需要编辑距离,这是单词之间最直观的距离,您还应该考虑提取哪个。

Can't give a better anwser without more information on data/context.如果没有关于数据/上下文的更多信息,就无法提供更好的 anwser。

Regards问候

Got it: Please follow this link for document clustering: http://brandonrose.org/clustering It gives an exact precise description.In order to convert it into normal string clustering where you have a list of names(strings) just pass the the list in place of the title list passed in the explanation.Also replace each occurrence of synopses list in the example with the list you want to cluster(in this case the list containing the strings to be clustered)明白了:请按照此链接进行文档聚类: http : //brandonrose.org/clustering它给出了精确的描述。为了将其转换为普通的字符串聚类,其中您有一个名称(字符串)列表,只需通过列表代替解释中传递的标题列表。还将示例中出现的每个概要列表替换为要聚类的列表(在本例中,列表包含要聚类的字符串)

You can skip few snippets since they provide extra information.Keeping them in the code will not harm you final clusters.您可以跳过一些片段,因为它们提供了额外的信息。将它们保留在代码中不会损害您的最终集群。

you can use the Naive Bayes algorithm for phrase clustering, for example in php您可以使用朴素贝叶斯算法进行短语聚类,例如在 php

$classifier = new \Niiknow\Bayes();

// teach it positive phrases

$classifier->learn('amazing, awesome movie!! Yeah!! Oh boy.', 'positive');
$classifier->learn('Sweet, this is incredibly, amazing, perfect, great!!', 'positive');

// teach it a negative phrase

$classifier->learn('terrible, shitty thing. Damn. Sucks!!', 'negative');

// now ask it to categorize a document it has never seen before

$classifier->categorize('awesome, cool, amazing!! Yay.');
// => 'positive'

relevant library at here相关图书馆在这里

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中聚类相似字符串的算法? - Algorithm to to Cluster Similar Strings in Python? 机器学习基于字符串相似性将字符串预处理到数字 - Machine learning preprocess strings to numbers based on string similarity 如何使用表中的熊猫将字符串的行合并为一个字符串,或者如何使用python连接句子中列的不同行? - How to combine rows of strings into one using pandas in a table or how to concatenate different rows of a column in a sentence using python? 如何在python中比较具有不同编码的字符串? - How compare strings with different encoding in python? 如何使用 python 中的预定义词组将字符串的单词分组为不同的字符串? - How to group words of a string into different strings using pre-defined word groups in python? 如何使用Javascript匹配不同数组中的字符串 - How to match strings in different arrays using Javascript 如何在python2中对具有不同类型的不同字符串进行操作? - How to do operations on different strings having different types in python2? 是否可以使用python3比较不同顺序的字符串 - Is is possible to compare strings with different order using python3 如何使用 Python 在另一个字符串列表中有效地搜索字符串列表? - How to efficiently search for a list of strings in another list of strings using Python? 如何使用 Python 和 Regex 搜索特定字符串 - How to search for specific strings using Python and Regex
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM