简体   繁体   English

MySQL将排名最高与最低匹配的最佳方法是什么?

[英]MySQL what would the best approach to ranking highest to lowest possible match?

I have a MySQL database I'm searching through. 我有一个我正在搜索的MySQL数据库。 Lets say this is a database of people. 让我们说这是一个人的数据库。 When querying for a specific record, it is possible to find a match 100% on each attribute. 查询特定记录时,可以在每个属性上找到100%的匹配项。 But querying the database to find closest match on probability (closest matches on table attributes) is more of the strategy. 但是查询数据库以找到最接近的概率匹配(表属性上最接近的匹配)更多的是策略。

In this scenario, does it make sense to create a temporary table (much like a tally-sheet) to indicate what attributes match/what attributes are present? 在这种情况下,创建临时表(非常类似于计数表)以指示哪些属性匹配/存在哪些属性是否有意义? What is the typical approach to doing advanced searches on database like this? 在这样的数据库上进行高级搜索的典型方法是什么?

Example (below) of a hypothetical stored Procedure 假设存储过程的示例(下面)

*parameters are just to exemplify how I would search. *参数只是为了举例说明我的搜索方式。 I'm not concerned how to perform my selects. 关心如何执行我的选择。 Question is about approach, strategy, technique * 问题是关于方法,策略,技术*

call FindPerson ("Brown Eyes", "Brown hair", "Height:6'1", "white", "Name:Joe" ,"weight180", "Age 34" "sex m");

RESULT TABLE
NAME  AGE HEIGHT WEIGHT HAIR  SKIN  sex  RANK_MATCH
Joe   32  6'1    180    Brown white m    1
Mike  33  6'1    179    Brown white m    2
James 31  6'0    179    Brown black m    3 

Just out of my mind. 只是出于我的想法。 You can create your own score and sort by it. 您可以创建自己的分数并按其排序。 Something like 就像是

SELECT `id`,
  (IF(`age`=32,1,0)+IF(`height`="6'1",1,0)+...) as  `score`
FROM `people`
HAVING `score` > 0
ORDER BY `score` DESC
LIMIT 10;

With this, you can handle every field with its own comparison, and also weight the individual attributes by not just add 1 but 2 or more. 通过这种方式,您可以使用自己的比较处理每个字段,并且还可以通过不仅添加1而是添加2或更多来对各个属性进行加权。 But I'm quiet not sure, how performant this is. 但我很安静,不确定,这是多么高效。

The approach I would use would be to create a scoring function (your stored proc) that would evaluate the given input's standard distance from the mean. 我将使用的方法是创建一个评分函数(您的存储过程),它将评估给定输入与均值的标准距离。

In the proc, you would judge each criteria in a fashion similar to: 在过程中,您将以类似于以下的方式判断每个标准:

INPUT AGE: 32
calculate MEAN of AGE WHERE (sex = m): 34.5
calculate STANDARD DEVIATION of AGE WHERE (sex = m): 2.5
calculate how many STDEVs 32 is from the 34.5 (also known as z-score): 1

Repeat this process for all numeric datatypes, summing them and ORDER BY the sum. 对所有数值数据类型重复此过程,将它们相加并将ORDER BY相加。

In doing so, the following schema change would be required: height changed from foot/inch form to strictly inches. 在这样做时,将需要以下架构更改:高度从英尺/英寸形式更改为严格英寸。

Depending on your needs, you may also consider coming up with an arbitrary scale for sex and skin color/hair color. 根据您的需要,您还可以考虑为性别和肤色/头发颜色设置任意比例。 Of course, you may think that measures like these should NOT be factored in because of how drastically it would change the scoring function. 当然,您可能认为不应该考虑这些措施,因为它会大大改变评分函数。 If you chose to, you'd have to find some number that would be added to the above SUM...but it's hard because nominative variables don't translate easily into these kinds of things. 如果你选择了,你必须找到一些可以添加到上面的SUM中的数字......但是这很难,因为主格变量不容易转化为这些类型的东西。

If you find that haircolor/skin color is able to be usefully transferred into say, the continous color spectrum, your scoring tidbit would be the same...color value of input vs color value of means and standard deviations. 如果您发现头发颜色/肤色能够有效地转换为连续色谱,那么您的得分消息将是相同的......输入的颜色值与均值和标准偏差的颜色值相对应。

The query that would find your matches would be something to the effect of: 找到你的匹配的查询将产生以下效果:

SELECT
 ABS(INPUT_AGE - AVG(AGE)) / STD(AGE) AS age_z,
 ABS(INPUT_WT - AVG(WT)) / STD(WT) AS wt_z,
...
 (age_z + wt_z + ...) AS score
FROM `table`
ORDER BY score ASC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM