I have a list with weekly figures and need to obtain the grouped totals by month.
The following code does the job, but there should be a more pythonic way of doing it with using the standard libraries. The drawback of the code below is that the list needs to be in sorted order.
#Test data (not sorted)
sum_weekly=[('2020/01/05', 59), ('2020/01/19', 88), ('2020/01/26', 95), ('2020/02/02', 89),
('2020/02/09', 113), ('2020/02/16', 90), ('2020/02/23', 68), ('2020/03/01', 74), ('2020/03/08', 85),
('2020/04/19', 6), ('2020/04/26', 5), ('2020/05/03', 14),
('2020/05/10', 5), ('2020/05/17', 20), ('2020/05/24', 28),('2020/03/15', 56), ('2020/03/29', 5), ('2020/04/12', 2),]
month = sum_weekly[0][0].split('/')[1]
count=0
out=[]
for item in sum_weekly:
m_sel = item[0].split('/')[1]
if m_sel!=month:
out.append((month, count))
count=item[1]
else:
count+=item[1]
month = m_sel
out.append((month, count))
# monthly sums output as ('01', 242), ('02', 360), ('03', 220), ('04', 13), ('05', 67)
print (out)
You could use defaultdict
to store the result instead of a list. The keys of the dictionary would be the months and you can simply add the values with the same month (key).
Possible implementation:
# Test Data
from collections import defaultdict
sum_weekly = [('2020/01/05', 59), ('2020/01/19', 88), ('2020/01/26', 95), ('2020/02/02', 89),
('2020/02/09', 113), ('2020/02/16', 90), ('2020/02/23', 68), ('2020/03/01', 74), ('2020/03/08', 85),
('2020/03/15', 56), ('2020/03/29', 5), ('2020/04/12', 2), ('2020/04/19', 6), ('2020/04/26', 5),
('2020/05/03', 14),
('2020/05/10', 5), ('2020/05/17', 20), ('2020/05/24', 28)]
results = defaultdict(int)
for date, count in sum_weekly: # used unpacking to make it clearer
month = date.split('/')[1]
# because we use a defaultdict if the key does not exist it
# the entry for the key will be created and initialize at zero
results[month] += count
print(results)
You can use itertools.groupby
(it is part of standard library) - it does pretty much what you did under the hood (grouping together sequences of elements for which the key function gives same output). It can look like the following:
import itertools
def select_month(item):
return item[0].split('/')[1]
def get_value(item):
return item[1]
result = [(month, sum(map(get_value, group)))
for month, group in itertools.groupby(sorted(sum_weekly), select_month)]
print(result)
Terse, but maybe not that pythonic:
import calendar, functools, collections
{calendar.month_name[i]: val for i, val in functools.reduce(lambda a, b: a + b, [collections.Counter({datetime.datetime.strptime(time, '%Y/%m/%d').month: val}) for time, val in sum_weekly]).items()}
a method using pyspark
from pyspark import SparkContext
sc = SparkContext()
l = sc.parallelize(sum_weekly)
r = l.map(lambda x: (x[0].split("/")[1], x[1])).reduceByKey(lambda p, q: (p + q)).collect()
print(r) #[('04', 13), ('02', 360), ('01', 242), ('03', 220), ('05', 67)]
You can accomplish this with a Pandas dataframe. First, you isolate the month, and then use groupby.sum().
import pandas as pd
sum_weekly=[('2020/01/05', 59), ('2020/01/19', 88), ('2020/01/26', 95), ('2020/02/02', 89), ('2020/02/09', 113), ('2020/02/16', 90), ('2020/02/23', 68), ('2020/03/01', 74), ('2020/03/08', 85), ('2020/04/19', 6), ('2020/04/26', 5), ('2020/05/03', 14), ('2020/05/10', 5), ('2020/05/17', 20), ('2020/05/24', 28),('2020/03/15', 56), ('2020/03/29', 5), ('2020/04/12', 2)]
df= pd.DataFrame(sum_weekly)
df.columns=['Date','Sum']
df['Month'] = df['Date'].str.split('/').str[1]
print(df.groupby('Month').sum())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.