简体   繁体   English

可以使用什么机制来量化非数字列表之间的相似性?

[英]What mechanism can be used to quantify similarity between non-numeric lists?

I have a database of recipes which is essentially structured as a list of ingredients and their associated quantities. 我有一个食谱数据库,基本上是一个成分列表及其相关数量。 If you are given a recipe how would you identify similar recipes allowing for variations and omissions? 如果您获得食谱,您如何识别允许变化和遗漏的类似食谱? For example using milk instead of water, or honey instead of sugar or entirely omitting something for flavour. 例如,使用牛奶代替水,或用蜂蜜代替糖,或者完全省略某些东西以获得风味。

The current strategy is to do multiple inner joins for combinations of the main ingredients but this is can be exceedingly slow with a large database. 目前的策略是对主要成分的组合进行多个内部连接,但是对于大型数据库来说这可能会非常慢。 Is there another way to do this? 还有另一种方法吗? Something to the equivalent of perceptual hashing would be ideal! 相当于感知哈希的东西是理想的!

How about cosine similarity ? 余弦相似度怎么样?

This technique is commonly used in Machine Learning for text recognition as a similarity measure . 该技术通常用于机器学习中,用于文本识别作为相似性度量 With it, you can calculate the distance between two texts (actually, between any two vectors) which can be interpreted as how much are those texts alike (the closer, the more alike). 有了它,你可以计算两个文本之间的距离 (实际上,在任意两个向量之间),可以解释为这些文本的数量相同(越接近,越相似)。

Take a look at this great question that explains cosine similarity in a simple way. 看看这个以简单方式解释余弦相似性的好问题 In general, you could use any similarity measure to obtain a distance to compare your recipe. 通常,您可以使用任何相似性度量来获得比较您的食谱的距离。 This article talks about different similarity measures, you can check it out if you wish to know more. 一篇关于不同的相似性措施的谈判,你可以检查出来,如果你想知道更多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在非数字维度上进行聚类 - Clustering on non-numeric dimensions 聚类非数字组 - Clustering non-numeric groups 群集方案:2个点的calculatedCost之间的差异,用作点之间的相似性度量。 是否适用? - Cluster Scenario: Difference between the computedCost of 2 points used as similarity measure between points. Is it applicable? 比较图表之间的相似性? - Compare similarity between graphs? 检查文本数据之间的相似性 - To check similarity between text data 有哪些评估聚类相似性的方法? - What ways of assessing similarity of clusterings are there? 应该给出什么作为链接函数的输入-tfidf矩阵或tfidf矩阵的不同元素之间的相似性? - What should be given as an input to linkage function - tfidf matrix or similarity between different elements of tfidf matrixes? 余弦相似度如何与K-means算法一起使用? - How does cosine similarity used with K-means algorithm? 计算Lucene文档和质心之间的相似度 - Calculating similarity between and centroid of Lucene documents 用于主题检测的推文之间的表示和良好的相似性度量 - Representation and a good similarity measure between Tweets for topic detection
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM