I am definitely doing it wrong at this point and my brain is melting. I have this data
queryset = [
{'source_id': '1', 'gender_id': 'female', 'total': 12928604, 'percentage': {'neutral': [8284384, 64.08], 'positive': [3146438, 24.34], 'negative': [1497782, 11.59]}},
{'source_id': '1', 'gender_id': 'male', 'total': 15238856, 'percentage': {'neutral': [10042152, 65.9], 'positive': [2476421, 16.25], 'negative': [2720283, 17.85]}},
{'source_id': '1', 'gender_id': 'null', 'total': 6, 'percentage': {'neutral': [5, 83.33], 'positive': [1, 16.67], 'negative': [0, 0.0]}},
{'source_id': '2', 'gender_id': 'female', 'total': 23546499, 'percentage': {'neutral': [15140308, 64.3], 'positive': [5372964, 22.82], 'negative': [3033227, 12.88]}},
{'source_id': '2', 'gender_id': 'male', 'total': 15349754, 'percentage': {'neutral': [10137025, 66.04], 'positive': [2413350, 15.72], 'negative': [2799379, 18.24]}},
{'source_id': '2', 'gender_id': 'null', 'total': 3422, 'percentage': {'neutral': [2464, 72.0], 'positive': [437, 12.77], 'negative': [521, 15.23]}}
{'source_id': '3', 'gender_id': 'female', 'total': 29417761, 'percentage': {'neutral': [18944384, 64.4], 'positive': [7181996, 24.41], 'negative': [3291381, 11.19]}},
{'source_id': '3', 'gender_id': 'male', 'total': 27200788, 'percentage': {'neutral': [17827887, 65.54], 'positive': [4179990, 15.37], 'negative': [5192911, 19.09]}},
{'source_id': '3', 'gender_id': 'null', 'total': 32909, 'percentage': {'neutral': [22682, 68.92], 'positive': [4005, 12.17], 'negative': [6222, 18.91]}}
]
my desired output is
[ {'source_id:1', 'total': 28167466(sum of 'male, female, null' total
values for source id=1) , percentage: {'neutral':[18326541,
65.06(getting the % out of neutral value from total)], 'positive':
[5622859, 19.96], 'negative':[4218065,14.97], {and do the same for all sources}]
what I do but doesn't work, I have 3if statement is for all the 3 IDs
for i in queryset:
if i['source_id'] == '1':
output['percentage'] = {
'neutral': [sum(i['percentage']['neutral'][0] for i in queryset if i['source_id'] == '1'),
round(output['negative'] / output['2_total'] * 100, 2)],
'positive': [sum(i['percentage']['positive'][0] for i in queryset if i['source_id'] == '2'),
round(output['positive'] / output['2_total'] * 100, 2)],
'negative': [sum(i['percentage']['negative'][0] for i in queryset if i['source_id'] == '2'),
round(output['negative'] / output['2_total'] * 100, 2)]}
You can use collections.Counter
to add up the totals:
from collections import Counter
counters = {}
for row in queryset:
# gender_id not needed
del row['gender_id']
# Pull the subtotals from 'percentage'
# into the parent dictionary, keeping only
# the subtotals in first list item,
# not the percentages
percentages = row.pop('percentage')
for k, v in percentages.items():
percentages[k] = v[0]
row.update(percentages)
# Use 'source_id' as key for the
# counters dictionary
index = row.pop('source_id')
if index not in counters:
counters[index] = Counter(row)
else:
counters[index].update(row)
This gives you the following:
{'1': Counter({'total': 28167466,
'neutral': 18326541,
'positive': 5622860,
'negative': 4218065}),
'2': Counter({'total': 38899675,
'neutral': 25279797,
'positive': 7786751,
'negative': 5833127}),
'3': Counter({'total': 56651458,
'neutral': 36794953,
'positive': 11365991,
'negative': 8490514})}
From this, you can easily count the percentages and move it into the format required.
Alright if I understand correctly this is what you want:
unique_ids = set([item.get('source_id') for item in queryset]) # unique source ids
output = []
for id_ in unique_ids:
# only grab items that match the current source id
to_agg = list(filter(lambda x: x.get('source_id') == id_, queryset))
# sum the total field for this source id
total = sum((item.get('total') for item in to_agg))
# aggregate the data for neutral/positive/negative
percents = [item.get('percentage') for item in to_agg]
negatives = sum((item.get('negative')[0] for item in percents))
positives = sum((item.get('positive')[0] for item in percents))
neutrals = sum((item.get('neutral')[0] for item in percents))
# construct the final dictionary
d = {'source_id': id_,
'total': total,
'percentage': {'neutral': [neutrals, round(neutrals / total * 100, 2)],
'positives': [positives, round(positives / total * 100, 2)],
'negative': [negatives, round(negatives / total * 100, 2)]}}
output.append(d)
sorted(output, key=lambda x: x.get('source_id'))
[{'percentage': {'negative': [4218065, 14.97],
'neutral': [18326541, 65.06],
'positives': [5622860, 19.96]},
'source_id': '1',
'total': 28167466},
{'percentage': {'negative': [5833127, 15.0],
'neutral': [25279797, 64.99],
'positives': [7786751, 20.02]},
'source_id': '2',
'total': 38899675},
{'percentage': {'negative': [8490514, 14.99],
'neutral': [36794953, 64.95],
'positives': [11365991, 20.06]},
'source_id': '3',
'total': 56651458}]
Edit: Just keep in mind I have not optimized this answer so it might not be as fast as you need it to be if your query set is large.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.