如何正确计算百分比

Question

I have three dataframes that have column "City". 我有三个具有“城市”列的数据框。 All three dataframes have a different set of city names. 所有这三个数据框都有一组不同的城市名称。

I want to find the percentage of total matches between this column of each dataframe. 我想找到每个数据框的此列之间的总匹配百分比。

For this purpose I used set method and got three arrays 为此，我使用了set方法并得到了三个数组

set1 = set(df1['City'])
set2 = set(df2['City'])
set3 = set(df3['City'])

But how should I find the percentage? 但是我应该如何找到百分比？ I used these functions, but I'm not sure I did everything right 我使用了这些功能，但不确定我是否做对了所有事情

(len(set1) - len(set2))/len(set1)*100
(len(set1) - len(set3))/len(set1)*100
(len(set2) - len(set3))/len(set2)*100

Is this record right? 这个记录对吗？

Answer 1

You probably want this: 您可能想要这样：

percentage = ( len(set1.intersection(set2)) / len(set1.union(set2)) )*100

which gives you the percentage of common elements in set1 and set2 . 它为您提供set1和set2中公共元素的百分比。

This is also known as Jaccard Index , a measurement for similarity of sets. 这也称为Jaccard Index ，这是一组相似度的度量。

Answer 2

From the pure mathimatical side of things: I assume that you want to find the percentage of cities matching between respectively set1 & set2, set1 & set3 and set2 & set3. 从纯粹的数学角度来看：我假设您要查找分别在set1和set2，set1和set3与set2和set3之间匹配的城市的百分比。

To calculate this percentage, you need to find the number of matches and the length of the set of cities compared. 要计算此百分比，您需要找到匹配数和所比较的城市集的长度。

Then the percentage can be calculated as follows: 然后可以按如下方式计算百分比：

Percentage match 1 & 2 = [(number of matches between 1 & 2)/(length of the set)]*100 匹配百分比1和2 = [（1和2之间的匹配数）/（集合的长度）] * 100

For the code side of things: i agree with Sparkofska. 对于代码方面：我同意Sparkofska。

如何正确计算百分比

问题描述

2 个解决方案

解决方案1
1 2019-09-06 10:57:07

解决方案2
0 2019-09-06 11:36:40

如何正确计算百分比

问题描述

2 个解决方案

解决方案1 1 2019-09-06 10:57:07

解决方案2 0 2019-09-06 11:36:40

解决方案1
1 2019-09-06 10:57:07

解决方案2
0 2019-09-06 11:36:40