简体   繁体   English

Python:如果有重复项,如何查找两个列表的交集(实际上,我需要一个交集的长度)?

[英]Python: how to find intersection of two lists (i need a lenght of intersection, actually) if there are duplicates?

Let me have thse two lists: 让我列出以下两个列表:

a = ['a','b','c','a','a']
b = ['a','b','d']

I need to calculate Jaccard distance = (union-intersect)/union, but I know there gonna be duplicates in each list, and I want to count them, so intersect lenght for the example would be 2 and Jaccard distance = (8-2)/8 我需要计算Jaccard距离=(union-intersect)/ union,但是我知道每个列表中都会有重复项,因此我想对它们进行计数,因此示例中的相交长度将是2,Jaccard distance =(8-2 )/ 8

How can I do that? 我怎样才能做到这一点? first thought is to joint lists and then remove elements one by one... 首先想到的是联合列表,然后逐个删除元素...

UPDATE: probably I had to stress more that I need to count dublicates; 更新:可能我不得不强调我需要计算重复数;

here is my working solution, but it is quite ugly: 这是我的工作解决方案,但这很丑陋:

a = [1,2,3,1,1]
b = [2,1,1, 6,5]

import collections
aX = collections.Counter(a)
bX = collections.Counter(b)

r1 = [x for x in aX if x in bX]
print r1

print sum((min(aX[x], bX[x]) for x in r1))

>>> 3
a = ['a','b','c','a','a']
b = ['a','b','d']
c = list(set(b).intersection(a))
['a','b']

Note sets will discard duplicates! 笔记集将丢弃重复项!

To the get the jaccard index between two lists a and b: 要获取两个列表a和b之间的jaccard索引

def jaccard_distance(a,b):
    a = set(a)
    b = set(b)
    c = a.intersection(b)
    return float(len(a) + len(b) - len(c)) /(len(a) + len(b))

here is my working solution, but it is quite ugly: 这是我的工作解决方案,但这很丑陋:

a = [1,2,3,1,1]
b = [2,1,1, 6,5]

import collections
aX = collections.Counter(a)
bX = collections.Counter(b)

r1 = [x for x in aX if x in bX]
print r1

print sum((min(aX[x], bX[x]) for x in r1))

>>> 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM