简体   繁体   English

python循环中的计数对

[英]Over counting pairs in python loop

I have a list of dictionaries where each dict is of the form: 我有一个字典列表,每个字典的形式为:

 {'A': a,'B': b}

I want to iterate through the list and for every (a,b) pair, find the pair(s), (b,a), if it exists. 我想遍历列表,对于每个(a,b)对,找到对(b,a)(如果存在)。

For example if for a given entry of the list A = 13 and B = 14, then the original pair would be (13,14). 例如,如果对于列表的给定条目A = 13和B = 14,则原始对将为(13,14)。 I would want to search the entire list of dicts to find the pair (14,13). 我想搜索字典的整个列表以找到对(14,13)。 If (14,13) occurred multiple times I would like to record that too. 如果(14,13)多次出现,我也想记录一下。

I would like to count the number of times for all original (a,b) pairs in the list, when the complement (b,a) appears, and if so how many times. 我想计算列表中所有原始(a,b)对的出现次数,补码(b,a)出现的次数,以及出现次数。 To do this I have two for loops and a counter when a complement pair is found. 为此,我有两个for循环和一个找到补码对的计数器。

pairs_found = 0
for i, val in enumerate( list_of_dicts ):
    for j, vol in enumerate( list_of_dicts ):
        if val['A'] == vol['B']:
            if vol['A'] == val['B']:
                pairs_found += 1

This generates a pairs_found greater than the length of list_of_dicts . 这将生成一个pairs_found ,其长度大于list_of_dicts的长度。 I realize this is because the same pairs will be over-counted. 我意识到这是因为相同的货币对将被高估。 I am not sure how I can overcome this degeneracy? 我不确定如何克服这种退化?

Edit for Clarity 编辑清晰度

list_of_dicts = []

list_of_dicts[0] = {'A': 14, 'B', 23}
list_of_dicts[1] = {'A': 235, 'B', 98}
list_of_dicts[2] = {'A': 686, 'B', 999}
list_of_dicts[3] = {'A': 128, 'B', 123}

....

Lets say that the list has around 100000 entries. 可以说该列表有大约100000个条目。 Somewhere in that list, there will be one or more entries, of the form {'A' 23, 'B': 14}. 在该列表中的某处,将存在一个或多个条目,格式为{'A'23,'B':14}。 If this is true then I would like a counter to increase its value by one. 如果是这样,那么我希望计数器将其值增加一。 I would like to do this for every value in the list. 我想对列表中的每个值执行此操作。

Here is what I suggest: 这是我的建议:

  • Use tuple to represent your pairs and use them as dict/set keys. 使用元组表示您的配对,并将它们用作字典/设置键。
  • Build a set of unique inverted pairs you'll look for. 构建一组您要寻找的独特的反向对。
  • Use a dict to store the number of time a pair appears inverted 使用字典来存储一对反转出现的时间

Then the code should look like this: 然后,代码应如下所示:

# Create a set of unique inverted pairs    
inverted_pairs_set = {(d['B'],d['A']) for d in list_of_dicts}
# Create a counter for original pairs
pairs_counter_dict = {(ip[1],ip[0]):0 for ip in inverted_pairs_set]
# Create list of pairs
pairs_list = [(d['A'],d['B']) for d in list_of_dicts]
# Count for each inverted pairs, how many times 
for p in pairs_list:
   if p in inverted_pairs_set:
      pairs_counter_dict[(p[1],p[0])] += 1

You could first create a list with the values of each dictionary as tuples: 您可以首先创建一个列表,将每个字典的值作为元组:

example_dict = [{"A": 1, "B": 2}, {"A": 4, "B": 3}, {"A": 5, "B": 1}, {"A": 2, "B": 1}]
dict_values = [tuple(x.values()) for x in example_dict]

Then create a second list with the number of occurrences of each element inverted: 然后创建第二个列表,每个元素的出现次数取反:

occurrences = [dict_values.count(x[::-1]) for x in dict_values]

Finally, create a dict with dict_values as keys and occurrences as values: 最后,创建一个以dict_values作为键和occurrences作为值的dict:

dict(zip(dict_values, occurrences))

Output: 输出:

{(1, 2): 1, (2, 1): 1, (4, 3): 0, (5, 1): 0}

For each key, you have the number of inverted keys. 对于每个键,您都有反向键的数量。 You can also create the dictionary on the fly: 您还可以即时创建字典:

occurrences = {dict_values: dict_values.count(x[::-1]) for x in dict_values}

I am still not 100% sure what it is you want to do but here is my guess : 我仍然不是100%知道您要做什么,但这是我的猜测

pairs_found = 0
for i, dict1 in enumerate(list_of_dicts):
    for j, dict2 in enumerate(list_of_dicts[i+1:]):
        if dict1['A'] == dict2['B'] and dict1['B'] == dict2['A']:
            pairs_found += 1

Note the slicing on the second for loop. 注意第二个for循环上的切片。 This avoids checking pairs that have already been checked before (comparing D1 with D2 is enough; no need to compare D2 to D1) 这样可以避免检查之前已经检查过的对(将D1与D2进行比较就足够了;无需将D2与D1比较)

This is better than O(n**2) but still there is probably room for improvement 这比O(n ** 2)好,但仍有改善的空间

You can create a counter dictionary that contains the values of the 'A' and 'B' keys in all your dictionaries: 您可以创建一个包含所有字典中'A''B'键值的计数器字典:

complements_cnt = {(dct['A'], dct['B']): 0 for dct in list_of_dicts}

Then all you need is to iterate over your dictionaries again and increment the value for the "complements": 然后,您所需要做的就是再次遍历字典并增加“ complements”的值:

for dct in list_of_dicts:
    try:
        complements_cnt[(dct['B'], dct['A'])] += 1
    except KeyError:   # in case there is no complement there is nothing to increase
        pass

For example with such a list_of_dicts : 例如,这样的list_of_dicts

list_of_dicts = [{'A': 1, 'B': 2}, {'A': 2, 'B': 1}, {'A': 1, 'B': 2}]

This gives: 这给出:

{(1, 2): 1, (2, 1): 2}   

Which basically says that the {'A': 1, 'B': 2} has one complement (the second) and {'A': 2, 'B': 1} has two (the first and the last). 基本上说{'A': 1, 'B': 2}有一个补码(第二个),而{'A': 2, 'B': 1}有两个补数(第一个和最后一个)。

The solution is O(n) which should be quite fast even for 100000 dictionaries. 解决方案是O(n) ,即使对于100000字典也应该是非常快的。

Note: This is quite similar to @debzsud answer. 注意:这与@debzsud答案非常相似。 I haven't seen it before I posted the answer though. 在发布答案之前,我还没有看到它。 :( :(

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM