简体   繁体   English

来自两个列表的Python字典

[英]Python dictionary from two lists

I have two lists, one of them is a list of values and the other is a list of dates. 我有两个列表,其中一个是值列表,另一个是日期列表。

I want to create a dictionary with values and dates as keys. 我想创建一个以值和日期为键的字典。 But a lot of the values have the same "key" (date). 但是许多值具有相同的“键”(日期)。 I need to add the values with the same date (same key) together before making a dictionary. 在制作字典之前,我需要将具有相同日期(相同键)的值加在一起。

Both of the lists have the same number of elements but the list of dates has some values duplicated (since every date has more than one value). 两个列表具有相同数量的元素,但日期列表具有重复的某些值(因为每个日期均具有多个值)。

What would be the best way to group the values (add them together) based on the keys (dates)? 根据键(日期)对值进行分组(将它们加在一起)的最佳方法是什么?

Examples of the lists 清单示例

dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]

values = [2,7,4,8,4]

I want my dictionary to look like this:
dict = [datetime(2014, 2, 1, 0, 0):13,datetime(2014, 3, 1, 0, 0):8,datetime(2014, 3, 1, 0, 0):4]

If you have repeating dates and want to group the values for repeating keys, use a defaultdict : 如果您有重复的日期,并且希望将重复键的值分组,请使用defaultdict

from collections import defaultdict
d = defaultdict(int)
for dte, val in zip(dates, values):
    d[dte] += val

Output: 输出:

defaultdict(<class 'int'>, {datetime.datetime(2014, 2, 1, 0, 0): 13, datetime.datetime(2014, 3, 1, 0, 0): 12})

Or using a normal dict and dict.setdefault : 或使用普通dict和dict.setdefault

d = {}
for dte, val in zip(dates,values):
    d.setdefault(dte,0)
    d[dte] += val

Lastly you can use dict.get with a default value of 0: 最后,您可以使用默认值为0的dict.get:

d = {}
for dte, val in zip(dates,values):
    d[dte] = d.get(dte, 0) + val

The defaultdict is going to be the fastest approach as it is designed exactly for this purpose. defaultdict将是最快的方法,因为它正是为此目的而设计的。

Assuming if this is your input, 假设这是您的输入,

>>> dates = ['2015-01-01', '2015-01-01', '2015-01-02', '2015-01-03']
>>> values = [10, 15, 10, 10]

Combine the values, 合并值,

>>> data = zip(dates, values)
[('2015-01-01', 10), ('2015-01-01', 15), ('2015-01-02', 10), ('2015-01-03', 10)]

Aggregate the values for same dates, 汇总相同日期的值,

>>> import itertools
>>> new_data = []
>>> for key, group in itertools.groupby(data, lambda x: x[0]):
        tmp = [key, 0]    #: '0' is the default value
        for thing in group:
            tmp[1] += thing[1]
    new_data.append(tmp)

Print the new_data , 打印new_data

>>> new_data
[['2015-01-01', 25], ['2015-01-02', 10], ['2015-01-03', 10]]

Now build the final dictionary, 现在建立最终的字典,

>>> dict(new_data)
{'2015-01-03': 10, '2015-01-02': 10, '2015-01-01': 25}

itertools and defaultdict are pretty unnecessary for this. 为此, itertoolsdefaultdict完全没有必要。 I think that this is simpler and easier to read. 我认为这更容易阅读。

dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]

combined = {}
for (date,value) in zip(dates,values):
  if date in combined:
    combined[date] += value
  else:
    combined[date] = value

Performance analysis 性能分析

I'm not saying that defaultdict is a bad solution, I was only pointing out that it requires more tacit knowledge to use without pitfalls. 我并不是说defaultdict是一个不好的解决方案,我只是指出它需要更多隐性知识才能使用而不会陷入陷阱。

It is not however the fastest solution. 但是,它不是最快的解决方案。

from collections import defaultdict
from datetime import datetime
import timeit

dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]

def combine_default_dict(dates=dates,values=values):
  d = defaultdict(int)
  for dte, val in zip(dates, values):
      d[dte] += val
  return d

def combine_setdefault(dates=dates,values=values):
  d = {}
  for dte, val in zip(dates,values):
      d.setdefault(dte,0)
      d[dte] += val
  return d

def combine_get(dates=dates,values=values):
  d = {}
  for dte, val in zip(dates,values):
      d[dte] = d.get(dte, 0) + val
  return d

def combine_contains(dates=dates,values=values):
  d = {}
  for (date,value) in zip(dates,values):
    if date in d:
      d[date] += value
    else:
      d[date] = value
  return d

def time_them(number=100000):
  for func_name in [k for k in sorted(globals().keys()) if k.startswith('combine_')]:
    timer = timeit.Timer("{0}()".format(func_name),"from __main__ import {0}".format(func_name))
    time_taken = timer.timeit(number=number)
    print "{0} - {1}".format(time_taken,func_name)

Yields: 产量:

>>> time_them()
0.388070106506 - combine_contains
0.485766887665 - combine_default_dict
0.415601968765 - combine_get
0.472551822662 - combine_setdefault

I've tried it on a couple of different machines and python versions. 我已经在几个不同的机器和python版本上尝试过。 combine_default_dict competes with combine_setdefault for the slowest. combine_default_dict与竞争combine_setdefault最慢的。 combine_contains has been consistently the fastest. combine_contains一直是最快的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM