简体   繁体   中英

Python calculate total value and percentage of it in dictionary

I am definitely doing it wrong at this point and my brain is melting. I have this data

queryset = [
{'source_id': '1', 'gender_id': 'female', 'total': 12928604, 'percentage': {'neutral': [8284384, 64.08], 'positive': [3146438, 24.34], 'negative': [1497782, 11.59]}},
{'source_id': '1', 'gender_id': 'male', 'total': 15238856, 'percentage': {'neutral': [10042152, 65.9], 'positive': [2476421, 16.25], 'negative': [2720283, 17.85]}},
{'source_id': '1', 'gender_id': 'null', 'total': 6, 'percentage': {'neutral': [5, 83.33], 'positive': [1, 16.67], 'negative': [0, 0.0]}},
{'source_id': '2', 'gender_id': 'female', 'total': 23546499, 'percentage': {'neutral': [15140308, 64.3], 'positive': [5372964, 22.82], 'negative': [3033227, 12.88]}},
{'source_id': '2', 'gender_id': 'male', 'total': 15349754, 'percentage': {'neutral': [10137025, 66.04], 'positive': [2413350, 15.72], 'negative': [2799379, 18.24]}},
{'source_id': '2', 'gender_id': 'null', 'total': 3422, 'percentage': {'neutral': [2464, 72.0], 'positive': [437, 12.77], 'negative': [521, 15.23]}}
{'source_id': '3', 'gender_id': 'female', 'total': 29417761, 'percentage': {'neutral': [18944384, 64.4], 'positive': [7181996, 24.41], 'negative': [3291381, 11.19]}},
{'source_id': '3', 'gender_id': 'male', 'total': 27200788, 'percentage': {'neutral': [17827887, 65.54], 'positive': [4179990, 15.37], 'negative': [5192911, 19.09]}},
{'source_id': '3', 'gender_id': 'null', 'total': 32909, 'percentage': {'neutral': [22682, 68.92], 'positive': [4005, 12.17], 'negative': [6222, 18.91]}}
]

my desired output is

    [ {'source_id:1', 'total': 28167466(sum of 'male, female, null' total
   values for source id=1) , percentage: {'neutral':[18326541, 
   65.06(getting   the % out of neutral value from total)], 'positive': 
   [5622859, 19.96], 'negative':[4218065,14.97], {and do the same for all sources}]

what I do but doesn't work, I have 3if statement is for all the 3 IDs

for i in queryset:
if i['source_id'] == '1':
    output['percentage'] = {
        'neutral': [sum(i['percentage']['neutral'][0] for i in queryset if i['source_id'] == '1'),
                    round(output['negative'] / output['2_total'] * 100, 2)],

        'positive': [sum(i['percentage']['positive'][0] for i in queryset if i['source_id'] == '2'),
                     round(output['positive'] / output['2_total'] * 100, 2)],

        'negative': [sum(i['percentage']['negative'][0] for i in queryset if i['source_id'] == '2'),
                     round(output['negative'] / output['2_total'] * 100, 2)]}

You can use collections.Counter to add up the totals:

from collections import Counter

counters = {}
for row in queryset:
    # gender_id not needed
    del row['gender_id']
    # Pull the subtotals from 'percentage'
    # into the parent dictionary, keeping only
    # the subtotals in first list item,
    # not the percentages
    percentages = row.pop('percentage')
    for k, v in percentages.items():
        percentages[k] = v[0]
    row.update(percentages)
    # Use 'source_id' as key for the 
    # counters dictionary
    index = row.pop('source_id')
    if index not in counters:
        counters[index] = Counter(row)
    else:
        counters[index].update(row)

This gives you the following:

{'1': Counter({'total': 28167466,
          'neutral': 18326541,
          'positive': 5622860,
          'negative': 4218065}),
 '2': Counter({'total': 38899675,
          'neutral': 25279797,
          'positive': 7786751,
          'negative': 5833127}),
 '3': Counter({'total': 56651458,
          'neutral': 36794953,
          'positive': 11365991,
          'negative': 8490514})}

From this, you can easily count the percentages and move it into the format required.

Alright if I understand correctly this is what you want:

unique_ids = set([item.get('source_id') for item in queryset]) # unique source ids

output = []

for id_ in unique_ids:
    # only grab items that match the current source id
    to_agg = list(filter(lambda x: x.get('source_id') == id_, queryset))

    # sum the total field for this source id
    total = sum((item.get('total') for item in to_agg))

    # aggregate the data for neutral/positive/negative
    percents = [item.get('percentage') for item in to_agg]
    negatives = sum((item.get('negative')[0] for item in percents))
    positives = sum((item.get('positive')[0] for item in percents))
    neutrals = sum((item.get('neutral')[0] for item in percents))

    # construct the final dictionary
    d = {'source_id': id_,
         'total': total,
         'percentage': {'neutral': [neutrals, round(neutrals / total * 100, 2)],
                        'positives': [positives, round(positives / total * 100, 2)],
                        'negative': [negatives, round(negatives / total * 100, 2)]}}

    output.append(d)

sorted(output, key=lambda x: x.get('source_id'))

[{'percentage': {'negative': [4218065, 14.97],
   'neutral': [18326541, 65.06],
   'positives': [5622860, 19.96]},
  'source_id': '1',
  'total': 28167466},
 {'percentage': {'negative': [5833127, 15.0],
   'neutral': [25279797, 64.99],
   'positives': [7786751, 20.02]},
  'source_id': '2',
  'total': 38899675},
 {'percentage': {'negative': [8490514, 14.99],
   'neutral': [36794953, 64.95],
   'positives': [11365991, 20.06]},
  'source_id': '3',
  'total': 56651458}]

Edit: Just keep in mind I have not optimized this answer so it might not be as fast as you need it to be if your query set is large.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM