简体   繁体   English

Python 3 - 计算两个列表中的匹配项(包括重复项)

[英]Python 3 - counting matches in two lists (including duplicates)

First of all, I'm new to programming and python, I've looked here but can't find a solution, if this is a stupid question though please forgive me!首先,我是编程和python的新手,我看过这里但找不到解决方案,如果这是一个愚蠢的问题,请原谅我!

I have two lists and I'm trying to determine how many times items in the second list appears in the first list.我有两个列表,我正在尝试确定第二个列表中的项目出现在第一个列表中的次数。

I have the following solution:我有以下解决方案:

    list1 = ['black','red','yellow']
    list2 = ['the','big','black','dog']
    list3 = ['the','black','black','dog']
    p = set(list1)&set(list2)
    print(len(p))

It works fine apart from when the second list contains duplicates.除了第二个列表包含重复项之外,它工作正常。

ie list1 and list2 above returns 1, but so does list1 and list3, when ideally that should return 2即上面的list1和list2返回1,但list1和list3也是如此,理想情况下应该返回2

Can anyone suggest a solution to this?任何人都可以提出解决方案吗? Any help would be appreciated!任何帮助,将不胜感激!

Thanks,谢谢,

Adam亚当

You're seeing this problem because of you're using sets for your collection type.您看到此问题是因为您使用集合作为集合类型。 Sets have two characteristics: they're unordered (which doesn't matter here), and their elements are unique.集合有两个特征:它们是无序的(在这里无关紧要),并且它们的元素是唯一的。 So you're losing the duplicates in the lists when you convert them to sets, before you even find their intersection:因此,当您将它们转换为集合时,您甚至会在找到它们的交集之前丢失列表中的重复项:

>>> p = ['1', '2', '3', '3', '3', '3', '3']
>>> set(p)
set(['1', '2', '3'])

There are several ways you can do what you're looking to do here, but you'll want to start by looking at the list count method.您可以通过多种方式在此处执行您想要执行的操作,但您需要从查看列表count方法开始。 I would do something like this:我会做这样的事情:

>>> list1 = ['a', 'b', 'c']
>>> list2 = ['a', 'b', 'c', 'c', 'c']
>>> results = {}
>>> for i in list1:
        results[i] = list2.count(i) 
>>> results
{'a': 1, 'c': 3, 'b': 1}

This approach creates a dictionary ( results ), and for each element in list1 , creates a key in results , counts the times it occurs in list2 , and assigns that to the key's value.这种方法创建一个字典( results ),并为list1中的每个元素在results中创建一个键,计算它在list2中出现的次数,并将其分配给键的值。

Edit: As Lattyware points out, that approach solves a slightly different question than the one you asked.编辑:正如 Lattyware 指出的那样,这种方法解决的问题与您提出的问题略有不同。 A really fundamental solution would look like this一个真正基本的解决方案看起来像这样

>>> words = ['red', 'blue', 'yellow', 'black']
>>> list1 = ['the', 'black', 'dog']
>>> list2 = ['the', 'blue', 'blue', 'dog']
>>> results1 = 0
>>> results2 = 0
>>> for w in words:
        results1 += list1.count(w)
        results2 += list2.count(w)

>>> results1
1
>>> results2
2

This works in a similar way to my first suggestion: it iterates through each word in your main list (here I use words ), adds the number of times it appears in list1 to the counter results1 , and list2 to results2 .这与我的第一个建议类似:它遍历主列表中的每个单词(这里我使用words ),将它在list1中出现的次数添加到计数器results1中,并将list2results2中。

If you need more information than just the number of duplicates, you'll want to use a dictionary or, even better, the specialized Counter type in the collections modules.如果您需要的信息不仅仅是重复的数量,您将需要使用字典,或者更好的是collections模块中的专用Counter类型。 Counter is built to make everything I did in the examples above easy. Counter 旨在使我在上面的示例中所做的一切变得容易。

>>> from collections import Counter
>>> results3 = Counter()
>>> for w in words:
        results3[w] = list2.count(w)

>>> results3
Counter({'blue': 2, 'black': 0, 'yellow': 0, 'red': 0})
>>> sum(results3.values())
2

Shouldn't list 1 and list 2 return 0?清单 1 和清单 2 不应该返回 0 吗? Or did you mean或者你的意思是

list1 = ['black', 'red', 'yellow']

What you want, I think, is我想你想要的是

print(len([w for w in list2 if w in list1]))

The trouble with using sets is that a set have no duplicates.使用集合的问题在于集合没有重复项。 In fact, the usual reason for using a set is to eliminate duplicates.事实上,使用集合的通常原因是消除重复。 That's just what you don't want here, of course.当然,这只是你不想要的。

I know this is an old question, but if anyone was wondering how to get matches or the length of the matches from one or more lists.我知道这是一个老问题,但如果有人想知道如何从一个或多个列表中获取匹配项或匹配项的长度。 you can do this as well.你也可以这样做。

a = [1,2,3]
b = [2,3,4]
c = [2,4,5]

To get matches in two lists, say a and b will be要在两个列表中获得匹配,假设 a 和 b 将是

d = [value for value in a if value in b] # 2,3 

For the three lists, will be对于这三个列表,将是

d = [value for value in a if value in b and value in c] # 2
len(d) # to get the number of matches

also, if you need to handle duplicates.另外,如果您需要处理重复项。 it will be a matter of converting the list to a set beforehand eg这将是预先将列表转换为集合的问题,例如

a  = set(a) # and so on

If you mean you'd like to count the frequency of elements of list1 in list2, maybe this solution can work for you:如果您的意思是要计算 list2 中 list1 元素的频率,也许这个解决方案可以为您工作:

list1 = ['black', 'red', 'yellow']
list2 = ['the', 'big', 'black', 'dog']
list3 = ['the', 'black', 'black', 'dog']

first of all we can count the frequency of elements in list2 and construct a dict, and then we can construct a subdict from the dict according to the list1 ,and to get the total frequency you may count the values of sub_dct:首先我们可以统计list2中元素的频率并构造一个dict,然后我们可以根据list1从dict构造一个subdict,得到总频率可以统计sub_dct的值:

# count the frequency of elements of list1 in list2
def cntFrequency(lst1,lst2):
    dct=dict(Counter(lst2))
    sub_dct={k:dct.get(k,0) for k in lst1}
    return sub_dct

and the result is like:结果是这样的:

from collections import Counter

cnt_dct=cntFrequency(list1,list2)
print cnt_dct
print sum(cnt_dct.values())

# Output
{'black': 1, 'yellow': 0, 'red': 0}
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM