Python计算字典中的总值和百分比

Question

在这一点上我肯定做错了，我的大脑正在融化。 我有这个数据

queryset = [
{'source_id': '1', 'gender_id': 'female', 'total': 12928604, 'percentage': {'neutral': [8284384, 64.08], 'positive': [3146438, 24.34], 'negative': [1497782, 11.59]}},
{'source_id': '1', 'gender_id': 'male', 'total': 15238856, 'percentage': {'neutral': [10042152, 65.9], 'positive': [2476421, 16.25], 'negative': [2720283, 17.85]}},
{'source_id': '1', 'gender_id': 'null', 'total': 6, 'percentage': {'neutral': [5, 83.33], 'positive': [1, 16.67], 'negative': [0, 0.0]}},
{'source_id': '2', 'gender_id': 'female', 'total': 23546499, 'percentage': {'neutral': [15140308, 64.3], 'positive': [5372964, 22.82], 'negative': [3033227, 12.88]}},
{'source_id': '2', 'gender_id': 'male', 'total': 15349754, 'percentage': {'neutral': [10137025, 66.04], 'positive': [2413350, 15.72], 'negative': [2799379, 18.24]}},
{'source_id': '2', 'gender_id': 'null', 'total': 3422, 'percentage': {'neutral': [2464, 72.0], 'positive': [437, 12.77], 'negative': [521, 15.23]}}
{'source_id': '3', 'gender_id': 'female', 'total': 29417761, 'percentage': {'neutral': [18944384, 64.4], 'positive': [7181996, 24.41], 'negative': [3291381, 11.19]}},
{'source_id': '3', 'gender_id': 'male', 'total': 27200788, 'percentage': {'neutral': [17827887, 65.54], 'positive': [4179990, 15.37], 'negative': [5192911, 19.09]}},
{'source_id': '3', 'gender_id': 'null', 'total': 32909, 'percentage': {'neutral': [22682, 68.92], 'positive': [4005, 12.17], 'negative': [6222, 18.91]}}
]

我想要的输出是

    [ {'source_id:1', 'total': 28167466(sum of 'male, female, null' total
   values for source id=1) , percentage: {'neutral':[18326541, 
   65.06(getting   the % out of neutral value from total)], 'positive': 
   [5622859, 19.96], 'negative':[4218065,14.97], {and do the same for all sources}]

我做了什么但不起作用，我有 3if 语句用于所有 3 个 ID

for i in queryset:
if i['source_id'] == '1':
    output['percentage'] = {
        'neutral': [sum(i['percentage']['neutral'][0] for i in queryset if i['source_id'] == '1'),
                    round(output['negative'] / output['2_total'] * 100, 2)],

        'positive': [sum(i['percentage']['positive'][0] for i in queryset if i['source_id'] == '2'),
                     round(output['positive'] / output['2_total'] * 100, 2)],

        'negative': [sum(i['percentage']['negative'][0] for i in queryset if i['source_id'] == '2'),
                     round(output['negative'] / output['2_total'] * 100, 2)]}

Answer 1

您可以使用collections.Counter将总数相加：

from collections import Counter

counters = {}
for row in queryset:
    # gender_id not needed
    del row['gender_id']
    # Pull the subtotals from 'percentage'
    # into the parent dictionary, keeping only
    # the subtotals in first list item,
    # not the percentages
    percentages = row.pop('percentage')
    for k, v in percentages.items():
        percentages[k] = v[0]
    row.update(percentages)
    # Use 'source_id' as key for the 
    # counters dictionary
    index = row.pop('source_id')
    if index not in counters:
        counters[index] = Counter(row)
    else:
        counters[index].update(row)

这为您提供以下内容：

{'1': Counter({'total': 28167466,
          'neutral': 18326541,
          'positive': 5622860,
          'negative': 4218065}),
 '2': Counter({'total': 38899675,
          'neutral': 25279797,
          'positive': 7786751,
          'negative': 5833127}),
 '3': Counter({'total': 56651458,
          'neutral': 36794953,
          'positive': 11365991,
          'negative': 8490514})}

由此，您可以轻松计算百分比并将其移动到所需的格式中。

Answer 2

好吧，如果我理解正确，这就是你想要的：

unique_ids = set([item.get('source_id') for item in queryset]) # unique source ids

output = []

for id_ in unique_ids:
    # only grab items that match the current source id
    to_agg = list(filter(lambda x: x.get('source_id') == id_, queryset))

    # sum the total field for this source id
    total = sum((item.get('total') for item in to_agg))

    # aggregate the data for neutral/positive/negative
    percents = [item.get('percentage') for item in to_agg]
    negatives = sum((item.get('negative')[0] for item in percents))
    positives = sum((item.get('positive')[0] for item in percents))
    neutrals = sum((item.get('neutral')[0] for item in percents))

    # construct the final dictionary
    d = {'source_id': id_,
         'total': total,
         'percentage': {'neutral': [neutrals, round(neutrals / total * 100, 2)],
                        'positives': [positives, round(positives / total * 100, 2)],
                        'negative': [negatives, round(negatives / total * 100, 2)]}}

    output.append(d)

sorted(output, key=lambda x: x.get('source_id'))

[{'percentage': {'negative': [4218065, 14.97],
   'neutral': [18326541, 65.06],
   'positives': [5622860, 19.96]},
  'source_id': '1',
  'total': 28167466},
 {'percentage': {'negative': [5833127, 15.0],
   'neutral': [25279797, 64.99],
   'positives': [7786751, 20.02]},
  'source_id': '2',
  'total': 38899675},
 {'percentage': {'negative': [8490514, 14.99],
   'neutral': [36794953, 64.95],
   'positives': [11365991, 20.06]},
  'source_id': '3',
  'total': 56651458}]

编辑：请记住，我没有优化这个答案，所以如果您的查询集很大，它可能不会像您需要的那么快。

Python计算字典中的总值和百分比

问题描述

2 个解决方案

解决方案1
1 2019-03-17 08:02:18

解决方案2
0 已采纳 2019-03-16 20:55:40

Python计算字典中的总值和百分比

问题描述

2 个解决方案

解决方案1 1 2019-03-17 08:02:18

解决方案2 0 已采纳 2019-03-16 20:55:40

解决方案1
1 2019-03-17 08:02:18

解决方案2
0 已采纳 2019-03-16 20:55:40