简体   繁体   English

计算元组列表中的出现次数

[英]Count occurrences within a list of list of tuples

I think it is best to start with input and output: 我认为最好从输入和输出开始:

list_of_items = [
    {"A": "abc", "B": "dre", "C": "ccp"},
    {"A": "qwe", "B": "dre", "C": "ccp"},
    {"A": "abc", "B": "dre", "C": "ccp"},
]

result = {'A-abc-->B': {'dre': 2},
          'A-abc-->C': {'ccp': 2},
          'A-qwe-->B': {'dre': 1},
          'A-qwe-->C': {'ccp': 1},
          'B-dre-->A': {'abc': 2, 'qwe': 1},
          'B-dre-->C': {'ccp': 3},
          'C-ccp-->A': {'abc': 2, 'qwe': 1},
          'C-ccp-->B': {'dre': 3}}

My initial input is items that come as a stream. 我最初的输入是作为流出现的项目。 Those items are basically dictionaries with key and value. 这些项目基本上是具有关键和价值的字典。 My goal is to get for each specific key and value the maximum value for all other keys that came with it. 我的目标是获取每个特定的键,并为其附带的所有其他键取最大值。

So if out of 100 items, for the key "A" with value "1" I got in 90 items for key "B" the value "2" and in 10 items for key "B" the value "1111" I want to see a list that will show me those numbers. 因此,如果从100项中,对于值为“ 1”的键“ A”,我想为键“ B”得到90项,值为“ 2”,而对于键“ B”则有10项,值为“ 1111”,看到将显示这些数字的列表。 B2=90, B1111=10. B2 = 90,B1111 = 10。

My code is working. 我的代码正在工作。 But, my real life scenario contains more than 100000 different values for about 20 keys. 但是,我的现实生活场景包含大约20个键的100000个不同的值。 also, my final goal would be to run this as a job on Flink. 同样,我的最终目标是将其作为Flink上的一项工作来运行。

So I am looking for help with Counter / python stream api. 所以我正在寻找Counter / python stream api的帮助。

all_tuple_list_items = []
for dict_item in list_of_items:
    list_of_tuples = [(k, v) for (k, v) in dict_item.items()]
    all_tuple_list_items.append(list_of_tuples)

result_dict = {}
for list_of_tuples in all_tuple_list_items:
    for id_tuple in list_of_tuples:
        all_other_tuples = list_of_tuples.copy()
        all_other_tuples.remove(id_tuple)
        dict_of_specific_corresponding = {}

        for corresponding_other_tu in all_other_tuples:
            ids_connection_id = id_tuple[0] + "-" + str(id_tuple[1]) + "-->" + corresponding_other_tu[0]
            corresponding_id = str(corresponding_other_tu[1])

            if result_dict.get(ids_connection_id) is None:
                result_dict[ids_connection_id] = {corresponding_id: 1}
            else:
                if result_dict[ids_connection_id].get(corresponding_id) is None:
                    result_dict[ids_connection_id][corresponding_id] = 1
                else:
                    result_dict[ids_connection_id][corresponding_id] = result_dict[ids_connection_id][
                                                                           corresponding_id] + 1

pprint(result_dict)

You can use the function permutations() to generate all permutations of items in dicts and Counter to count them. 您可以使用函数permutations()来生成字典中所有项目的排列,并使用Counter来对它们进行计数。 Finally you can use defaultdict() to group items from Counter : 最后,您可以使用defaultdict()Counter项目进行分组:

from collections import Counter, defaultdict
from itertools import permutations
from pprint import pprint

list_of_items = [
    [{"A": "abc", "B": "dre", "C": "ccp"}],
    [{"A": "qwe", "B": "dre", "C": "ccp"}],
    [{"A": "abc", "B": "dre", "C": "ccp"}],
]

c = Counter(p for i in list_of_items       
              for p in permutations(i[0].items(), 2))
d = defaultdict(dict)
for ((i, j), (k, l)), num in c.items():
    d[f'{i}-{j}-->{k}'][l] = num

pprint(d)

Output: 输出:

defaultdict(<class 'dict'>,
            {'A-abc-->B': {'dre': 2},
             'A-abc-->C': {'ccp': 2},
             'A-qwe-->B': {'dre': 1},
             'A-qwe-->C': {'ccp': 1},
             'B-dre-->A': {'abc': 2, 'qwe': 1},
             'B-dre-->C': {'ccp': 3},
             'C-ccp-->A': {'abc': 2, 'qwe': 1},
             'C-ccp-->B': {'dre': 3}})

Got it to work. 得到它的工作。 But, still wants to get a more efficient way. 但是,仍然希望获得一种更有效的方法。 Using counters and stream. 使用计数器和流。 is that possible? 那可能吗?

code

all_tuple_list_items = []
for dict_item in list_of_items:
    list_of_tuples = [(k, v) for (k, v) in dict_item[0].items()]
    all_tuple_list_items.append(list_of_tuples)

result_dict = {}
for list_of_tuples in all_tuple_list_items:
    for id_tuple in list_of_tuples:
        all_other_tuples = list_of_tuples.copy()
        all_other_tuples.remove(id_tuple)
        dict_of_specific_corresponding = {}

        for corresponding_other_tu in all_other_tuples:
            ids_connection_id = id_tuple[0] + "-" + str(id_tuple[1]) + "-->" + corresponding_other_tu[0]
            corresponding_id = str(corresponding_other_tu[1])

            if result_dict.get(ids_connection_id) is None:
                result_dict[ids_connection_id] = {corresponding_id: 1}
            else:
                if result_dict[ids_connection_id].get(corresponding_id) is None:
                    result_dict[ids_connection_id][corresponding_id] = 1
                else:
                    result_dict[ids_connection_id][corresponding_id] = result_dict[ids_connection_id][
                                                                           corresponding_id] + 1

pprint(result_dict)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM