How to calculate percentage properly

Question

I have three dataframes that have column "City". All three dataframes have a different set of city names.

I want to find the percentage of total matches between this column of each dataframe.

For this purpose I used set method and got three arrays

set1 = set(df1['City'])
set2 = set(df2['City'])
set3 = set(df3['City'])

But how should I find the percentage? I used these functions, but I'm not sure I did everything right

(len(set1) - len(set2))/len(set1)*100
(len(set1) - len(set3))/len(set1)*100
(len(set2) - len(set3))/len(set2)*100

Is this record right?

Answer 1

You probably want this:

percentage = ( len(set1.intersection(set2)) / len(set1.union(set2)) )*100

which gives you the percentage of common elements in set1 and set2 .

This is also known as Jaccard Index , a measurement for similarity of sets.

Answer 2

From the pure mathimatical side of things: I assume that you want to find the percentage of cities matching between respectively set1 & set2, set1 & set3 and set2 & set3.

To calculate this percentage, you need to find the number of matches and the length of the set of cities compared.

Then the percentage can be calculated as follows:

Percentage match 1 & 2 = [(number of matches between 1 & 2)/(length of the set)]*100

For the code side of things: i agree with Sparkofska.

How to calculate percentage properly

Question

2 answers

solution1
1 2019-09-06 10:57:07

solution2
0 2019-09-06 11:36:40

How to calculate percentage properly

Question

2 answers

solution1 1 2019-09-06 10:57:07

solution2 0 2019-09-06 11:36:40

solution1
1 2019-09-06 10:57:07

solution2
0 2019-09-06 11:36:40