简体   繁体   English

给定两个文本块,如何生成系数以比较它们的相似程度?

[英]Given two blocks of text, how can I generate a coefficient to compare how similar they are?

Basically, I'm not looking for specific differences as you would get with a normal diff algorithm, I'm looking more to generate some sort of numeric value which represents the level of difference of two blocks of text so that I can take a bunch of different text blocks and extract a set of those text blocks that qualify as being sufficiently unique from each other. 基本上,我不是在寻找与普通diff算法一样的特定差异,而是在寻找生成代表两个文本块的差异程度的数值的更多方法,这样我就可以提取不同的文本块,并提取出一组具有足够独特性的文本块。 Any ideas? 有任何想法吗?

您可以使用Levenshtein距离

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM