[英]Python dictionary from two lists
I have two lists, one of them is a list of values and the other is a list of dates. 我有两个列表,其中一个是值列表,另一个是日期列表。
I want to create a dictionary with values and dates as keys. 我想创建一个以值和日期为键的字典。 But a lot of the values have the same "key" (date). 但是许多值具有相同的“键”(日期)。 I need to add the values with the same date (same key) together before making a dictionary. 在制作字典之前,我需要将具有相同日期(相同键)的值加在一起。
Both of the lists have the same number of elements but the list of dates has some values duplicated (since every date has more than one value). 两个列表具有相同数量的元素,但日期列表具有重复的某些值(因为每个日期均具有多个值)。
What would be the best way to group the values (add them together) based on the keys (dates)? 根据键(日期)对值进行分组(将它们加在一起)的最佳方法是什么?
Examples of the lists 清单示例
dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]
I want my dictionary to look like this:
dict = [datetime(2014, 2, 1, 0, 0):13,datetime(2014, 3, 1, 0, 0):8,datetime(2014, 3, 1, 0, 0):4]
If you have repeating dates and want to group the values for repeating keys, use a defaultdict : 如果您有重复的日期,并且希望将重复键的值分组,请使用defaultdict :
from collections import defaultdict
d = defaultdict(int)
for dte, val in zip(dates, values):
d[dte] += val
Output: 输出:
defaultdict(<class 'int'>, {datetime.datetime(2014, 2, 1, 0, 0): 13, datetime.datetime(2014, 3, 1, 0, 0): 12})
Or using a normal dict and dict.setdefault
: 或使用普通dict和dict.setdefault
:
d = {}
for dte, val in zip(dates,values):
d.setdefault(dte,0)
d[dte] += val
Lastly you can use dict.get with a default value of 0: 最后,您可以使用默认值为0的dict.get:
d = {}
for dte, val in zip(dates,values):
d[dte] = d.get(dte, 0) + val
The defaultdict
is going to be the fastest approach as it is designed exactly for this purpose. defaultdict
将是最快的方法,因为它正是为此目的而设计的。
Assuming if this is your input, 假设这是您的输入,
>>> dates = ['2015-01-01', '2015-01-01', '2015-01-02', '2015-01-03']
>>> values = [10, 15, 10, 10]
Combine the values, 合并值,
>>> data = zip(dates, values)
[('2015-01-01', 10), ('2015-01-01', 15), ('2015-01-02', 10), ('2015-01-03', 10)]
Aggregate the values for same dates, 汇总相同日期的值,
>>> import itertools
>>> new_data = []
>>> for key, group in itertools.groupby(data, lambda x: x[0]):
tmp = [key, 0] #: '0' is the default value
for thing in group:
tmp[1] += thing[1]
new_data.append(tmp)
Print the new_data
, 打印new_data
,
>>> new_data
[['2015-01-01', 25], ['2015-01-02', 10], ['2015-01-03', 10]]
Now build the final dictionary, 现在建立最终的字典,
>>> dict(new_data)
{'2015-01-03': 10, '2015-01-02': 10, '2015-01-01': 25}
itertools
and defaultdict
are pretty unnecessary for this. 为此, itertools
和defaultdict
完全没有必要。 I think that this is simpler and easier to read. 我认为这更容易阅读。
dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]
combined = {}
for (date,value) in zip(dates,values):
if date in combined:
combined[date] += value
else:
combined[date] = value
Performance analysis 性能分析
I'm not saying that defaultdict
is a bad solution, I was only pointing out that it requires more tacit knowledge to use without pitfalls. 我并不是说defaultdict
是一个不好的解决方案,我只是指出它需要更多隐性知识才能使用而不会陷入陷阱。
It is not however the fastest solution. 但是,它不是最快的解决方案。
from collections import defaultdict
from datetime import datetime
import timeit
dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]
def combine_default_dict(dates=dates,values=values):
d = defaultdict(int)
for dte, val in zip(dates, values):
d[dte] += val
return d
def combine_setdefault(dates=dates,values=values):
d = {}
for dte, val in zip(dates,values):
d.setdefault(dte,0)
d[dte] += val
return d
def combine_get(dates=dates,values=values):
d = {}
for dte, val in zip(dates,values):
d[dte] = d.get(dte, 0) + val
return d
def combine_contains(dates=dates,values=values):
d = {}
for (date,value) in zip(dates,values):
if date in d:
d[date] += value
else:
d[date] = value
return d
def time_them(number=100000):
for func_name in [k for k in sorted(globals().keys()) if k.startswith('combine_')]:
timer = timeit.Timer("{0}()".format(func_name),"from __main__ import {0}".format(func_name))
time_taken = timer.timeit(number=number)
print "{0} - {1}".format(time_taken,func_name)
Yields: 产量:
>>> time_them()
0.388070106506 - combine_contains
0.485766887665 - combine_default_dict
0.415601968765 - combine_get
0.472551822662 - combine_setdefault
I've tried it on a couple of different machines and python versions. 我已经在几个不同的机器和python版本上尝试过。 combine_default_dict
competes with combine_setdefault
for the slowest. combine_default_dict
与竞争combine_setdefault
最慢的。 combine_contains
has been consistently the fastest. combine_contains
一直是最快的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.