简体   繁体   中英

Get the similarity of two numbers with python

i'm studying about Case-Based Reasoning algorithms , and I need to get the similarity of two numbers (integer or float).

For strings i'm using the Levenshtein lib and it handle well, but I don't know any Python lib to calculate the similarity of two numbers, there is one out there? Anyone knows? The result should be between 0 (different) and 1(perfect match), like Levenshtein.ratio() .

@update1:

Using Levenshtein.ratio we get the ratio of similarity of two strings, 0 means totaly different, 1 exact match, any between 0 and 1 is the coeficient of similarity. Example:

>>> import Levenshtein
>>> Levenshtein.ratio("This is a test","This is a test with similarity")
0.6363636363636364
>>> Levenshtein.ratio("This is a test","This is another test")
0.8235294117647058
>>> Levenshtein.ratio("This is a test","This is a test")
1.0
>>> 

I need something like that, but with numbers. For example, 5 has n% of similarity with 6. The number 5.4 has n% of similarity with 5.8. I don't know if my example is clear.

@update 2:

Let me put a real word example. Let's say i'm looking for similar versions of CentOS linux distributions on a range of 100 servers. The CentOS Linux version numbers are something like 5.6, 5.7, 6.5. So, how close the number 5.7 are of 6.5? It's not so close, we get many versions (numbers) between them. But there is a coeficient of similarity, let's say 40% (or 0.4) using some algorithm of similarity like Levenshtein.

@update 3: I got the answer for thia question. Im posting here to help more people:

>>> sum = 2.4 * 2.4
>>> sum2 = 7.5 * 7.5
>>> sum /math.sqrt(sum*sum2)
0.32
>>> sum = 7.4 * 7.4
>>> sum /math.sqrt(sum*sum2)
0.9866666666666666
>>> sum = 7.5 * 7.5
>>> sum /math.sqrt(sum*sum2)
1.0

From the link, I see that Ian Watson's slides show three options for assessing "similarity" of numbers. Of these, the "step function" option is readily available from numpy :

In [1]: from numpy import allclose

In [2]: a = 0.3 + 1e-9

In [3]: a == 0.3
Out[3]: False

In [4]: allclose(a, 0.3)
Out[4]: True

To get numeric output, as required for similarity, we make one change:

In [5]: int(a == 0.3)
Out[5]: 0

In [6]: int(allclose(a, 0.3))
Out[6]: 1

If preferred, float can be used in place of int :

In [8]: float(a == 0.3)
Out[8]: 0.0

In [9]: float(allclose(a, 0.3))
Out[9]: 1.0

allclose takes optional arguments rtol and atol so that you can specify, respectively, the relative or absolute tolerance to be used. Full documentation on allclose is here .

To calculate the similarity of 2 numbers (float or integer) I wrote a simple function

def num_sim(n1, n2):
  """ calculates a similarity score between 2 numbers """
  return 1 - abs(n1 - n2) / (n1 + n2)

It simply returns 1 if they are exactly equal. It will go to 0 as the values of numbers differ.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM