I have two lists, one of them is a list of values and the other is a list of dates.
I want to create a dictionary with values and dates as keys. But a lot of the values have the same "key" (date). I need to add the values with the same date (same key) together before making a dictionary.
Both of the lists have the same number of elements but the list of dates has some values duplicated (since every date has more than one value).
What would be the best way to group the values (add them together) based on the keys (dates)?
Examples of the lists
dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]
I want my dictionary to look like this:
dict = [datetime(2014, 2, 1, 0, 0):13,datetime(2014, 3, 1, 0, 0):8,datetime(2014, 3, 1, 0, 0):4]
If you have repeating dates and want to group the values for repeating keys, use a defaultdict :
from collections import defaultdict
d = defaultdict(int)
for dte, val in zip(dates, values):
d[dte] += val
Output:
defaultdict(<class 'int'>, {datetime.datetime(2014, 2, 1, 0, 0): 13, datetime.datetime(2014, 3, 1, 0, 0): 12})
Or using a normal dict and dict.setdefault
:
d = {}
for dte, val in zip(dates,values):
d.setdefault(dte,0)
d[dte] += val
Lastly you can use dict.get with a default value of 0:
d = {}
for dte, val in zip(dates,values):
d[dte] = d.get(dte, 0) + val
The defaultdict
is going to be the fastest approach as it is designed exactly for this purpose.
Assuming if this is your input,
>>> dates = ['2015-01-01', '2015-01-01', '2015-01-02', '2015-01-03']
>>> values = [10, 15, 10, 10]
Combine the values,
>>> data = zip(dates, values)
[('2015-01-01', 10), ('2015-01-01', 15), ('2015-01-02', 10), ('2015-01-03', 10)]
Aggregate the values for same dates,
>>> import itertools
>>> new_data = []
>>> for key, group in itertools.groupby(data, lambda x: x[0]):
tmp = [key, 0] #: '0' is the default value
for thing in group:
tmp[1] += thing[1]
new_data.append(tmp)
Print the new_data
,
>>> new_data
[['2015-01-01', 25], ['2015-01-02', 10], ['2015-01-03', 10]]
Now build the final dictionary,
>>> dict(new_data)
{'2015-01-03': 10, '2015-01-02': 10, '2015-01-01': 25}
itertools
and defaultdict
are pretty unnecessary for this. I think that this is simpler and easier to read.
dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]
combined = {}
for (date,value) in zip(dates,values):
if date in combined:
combined[date] += value
else:
combined[date] = value
Performance analysis
I'm not saying that defaultdict
is a bad solution, I was only pointing out that it requires more tacit knowledge to use without pitfalls.
It is not however the fastest solution.
from collections import defaultdict
from datetime import datetime
import timeit
dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]
def combine_default_dict(dates=dates,values=values):
d = defaultdict(int)
for dte, val in zip(dates, values):
d[dte] += val
return d
def combine_setdefault(dates=dates,values=values):
d = {}
for dte, val in zip(dates,values):
d.setdefault(dte,0)
d[dte] += val
return d
def combine_get(dates=dates,values=values):
d = {}
for dte, val in zip(dates,values):
d[dte] = d.get(dte, 0) + val
return d
def combine_contains(dates=dates,values=values):
d = {}
for (date,value) in zip(dates,values):
if date in d:
d[date] += value
else:
d[date] = value
return d
def time_them(number=100000):
for func_name in [k for k in sorted(globals().keys()) if k.startswith('combine_')]:
timer = timeit.Timer("{0}()".format(func_name),"from __main__ import {0}".format(func_name))
time_taken = timer.timeit(number=number)
print "{0} - {1}".format(time_taken,func_name)
Yields:
>>> time_them()
0.388070106506 - combine_contains
0.485766887665 - combine_default_dict
0.415601968765 - combine_get
0.472551822662 - combine_setdefault
I've tried it on a couple of different machines and python versions. combine_default_dict
competes with combine_setdefault
for the slowest. combine_contains
has been consistently the fastest.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.