简体   繁体   English

获取嵌套Python字典中特定值的计数

[英]Get counts of specific values within a nested Python dictionary

I have a giant nested dictionary (6k records) that I need to sort and count based on two values within my second dict. 我有一个巨大的嵌套字典(6k记录),我需要根据我的第二个字典中的两个值进行排序和计数。

item_dict = {
    64762.0: {
        'In Sheet': 'No',
        'Paid': Y,
        'Region': "AMER'",
        'Matrix Position': 'Check'
    },
    130301.0: {
        'Paid': N,
        'Region': "AMER'",
        'Matrix Position': 'Calculate'
    },
    13111.0: {
        'In Sheet': 'Yes',
        'Region': "EMEA'",
        'Matrix Position': 'Check'
    },
    130321.0: {
        'Matrix Position': 'Enhance',
        'In Sheet': 'No',
        'Paid': Y,
        'Region': "JP'"
    }
}

So, I need to get counts between regions and Matrix positions. 所以,我需要获得区域和矩阵位置之间的计数。 So, I'd wind up with: 所以,我最终会:

Amer and Calculate: 1
EMEA and Calculate: 0
EMEA and Check= 1
AMER and Check= 1
EMEA and Enhance= 0
JP and Check=0 

Et cetera. 等等。 The thing is, the full data set has 5 regions with 4 potential matrix positions. 问题是,完整的数据集有5个区域,有4个潜在的矩阵位置。 Is the best way to do this by using a for loop to search for each potential combination, then adding that to its own list? 通过使用for循环搜索每个可能的组合,然后将其添加到自己的列表中,这是最好的方法吗?

AmerCalculate=[]
for row in item_dict:
    if item_dict[row]['Region'] == "AMER'" and item_dict[row]['Matrix Position'] == "Calculate":
        AmerCalculate.append(row)

Then, to get the lengths, do len(AmerCalculate)? 然后,为了得到长度,做len(AmerCalculate)? Is there a more elegant way of doing this so I don't have to manually type out all 20 combinations? 有没有更优雅的方式这样做,所以我不必手动输入所有20种组合?

AmerCalculate={}
Regions = ["AMER", "EMEA", "JP"]
Positions = ["Calculate", "Check"]
for row in item_dict():
    for region in regions:
        for pos in Positions:
            if (item_dict[row]['Region']==region) and (item_dict[row][MatrixPosition] == pos:
    AmerCalculate(str(region)+ ' and ' +str(pos) + ":")+=1

This will return a dictionary with the format as follows: { "region + matrixposition:": total} for example {Amer and Calculate: 1, EMEA and calculate: 1} 这将返回一个格式如下的字典:{ "region + matrixposition:": total}例如{Amer and Calculate: 1, EMEA and calculate: 1}

Do you need to return the key? 你需要退还钥匙吗? or just the totals of each position per region? 或者只是每个地区每个职位的总数?

Use another dictionary to couple that data set together, from there you can generate the output you're looking for: 使用另一个字典将该数据集耦合在一起,从那里您可以生成您正在寻找的输出:

def dict_counter(dict_arg):
    d = {'AMER':[],'EMEA':[],'JP':[]}  # Regions as keys.

    for int_key in dict_arg:
        sub_dict = dict_arg[int_key]
        for key, value in sub_dict.items():
            if value in d:
                d[value].append(sub_dict['Matrix Position'])
    return d

Sample Output: 样本输出:

>>> item_dict= {12.0: {'In Sheet': 'No', 'Paid': 'Y', 'Region': "AMER",  'Matrix Position': 'Enhance'},1232.0: {'In Sheet': 'No', 'Paid': 'Y', 'Region': "AMER",  'Matrix Position': 'Check'}, 64762.0: {'In Sheet': 'No', 'Paid': 'Y', 'Region': "AMER",  'Matrix Position': 'Check'}, 130301.0: {'Paid': 'N', 'Region': "AMER",  'Matrix Position': 'Calculate'}, 13111.0: {'In Sheet': 'Yes', 'Region': "EMEA",  'Matrix Position': 'Check'}, 130321.0: {'Matrix Position': 'Enhance','In Sheet': 'No', 'Paid': 'Y', 'Region': "JP"}}
>>> print dict_counter(item_dict)
{'JP': ['Enhance'], 'AMER': ['Check', 'Calculate'], 'EMEA': ['Check']}

We now have the basis to generate the report you're looking for. 我们现在有了生成您正在寻找的报告的基础 We can use Counter to get a count of all position instances. 我们可以使用Counter来计算所有位置实例。 Here's an example of how we could go about checking for counts in the list mapped value. 这是我们如何检查list映射值中的计数的示例。

from collections import Counter

d = dict_counter(item_dict)
for k, v in d.items():
    for i, j in Counter(v).items():
        print k,'and',i,'=',j

>>> JP and Enhance = 1
>>> AMER and Enhance = 1
>>> AMER and Check = 2
>>> AMER and Calculate = 1
>>> EMEA and Check = 1

To get all combinations, you can use itertools.product . 要获得所有组合,可以使用itertools.product Then you can store the result in a dictionary: 然后,您可以将结果存储在字典中:

result = {}
for r, p in itertools.product(regions, positions):
    result[(r,p)] = len( [None for item in item_dict.values() if item['Region'] == r and item['Matrix Position'] == p] )

print(result[("AMER", "Calculate")])

Is it critical for you to use pure Python ? 使用纯Python是否至关重要? I guess, if you wanted just to do this once, you could do this without care of performance or beauty, either you want to know something new. 我想,如果你只想这样做一次,你可以做到这一点而不用考虑表现或美貌,或者你想知道一些新的东西。

What's about pandas library which can solve this problem in fast and elegant way without ugly loops? 什么是关于熊猫库可以快速和优雅的方式解决这个问题没有丑陋的循环? It allows to group your data in way you want to and manipulate it. 它允许以您希望的方式对数据进行分组并对其进行操作。 For example, this code 例如,这段代码

data_frame.groupby(['Region', 'Matrix Position'])['Matrix Position'].count()

Will give you what you wanted without doing any loops, not needed subroutines in fast and convenient way 将为您提供您想要的而无需任何循环,而不是快速方便的子程序

Region  Matrix Position
AMER'   Calculate          1
        Check              1
EMEA'   Check              1
JP'     Enhance            1

It may help you to continue processing/preparation of your data as it has a lot of abilities for data processing and analysis. 它可以帮助您继续处理/准备数据,因为它具有很多数据处理和分析能力。

One more example: following code will calculate amount of rows with AMER' region and Check matrix position 还有一个例子:下面的代码将计算具有AMER'区域和Check矩阵位置的行数

from pandas import DataFrame

data_frame = DataFrame(item_dict).transpose()
filtered_data = data_frame[(data_frame['Region'] == "AMER'")
                           & (data_frame['Matrix Position'] == 'Check')]
result = len(filtered_data.index)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM