简体   繁体   中英

Python dictionary from two lists

I have two lists, one of them is a list of values and the other is a list of dates.

I want to create a dictionary with values and dates as keys. But a lot of the values have the same "key" (date). I need to add the values with the same date (same key) together before making a dictionary.

Both of the lists have the same number of elements but the list of dates has some values duplicated (since every date has more than one value).

What would be the best way to group the values (add them together) based on the keys (dates)?

Examples of the lists

dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]

values = [2,7,4,8,4]

I want my dictionary to look like this:
dict = [datetime(2014, 2, 1, 0, 0):13,datetime(2014, 3, 1, 0, 0):8,datetime(2014, 3, 1, 0, 0):4]

If you have repeating dates and want to group the values for repeating keys, use a defaultdict :

from collections import defaultdict
d = defaultdict(int)
for dte, val in zip(dates, values):
    d[dte] += val

Output:

defaultdict(<class 'int'>, {datetime.datetime(2014, 2, 1, 0, 0): 13, datetime.datetime(2014, 3, 1, 0, 0): 12})

Or using a normal dict and dict.setdefault :

d = {}
for dte, val in zip(dates,values):
    d.setdefault(dte,0)
    d[dte] += val

Lastly you can use dict.get with a default value of 0:

d = {}
for dte, val in zip(dates,values):
    d[dte] = d.get(dte, 0) + val

The defaultdict is going to be the fastest approach as it is designed exactly for this purpose.

Assuming if this is your input,

>>> dates = ['2015-01-01', '2015-01-01', '2015-01-02', '2015-01-03']
>>> values = [10, 15, 10, 10]

Combine the values,

>>> data = zip(dates, values)
[('2015-01-01', 10), ('2015-01-01', 15), ('2015-01-02', 10), ('2015-01-03', 10)]

Aggregate the values for same dates,

>>> import itertools
>>> new_data = []
>>> for key, group in itertools.groupby(data, lambda x: x[0]):
        tmp = [key, 0]    #: '0' is the default value
        for thing in group:
            tmp[1] += thing[1]
    new_data.append(tmp)

Print the new_data ,

>>> new_data
[['2015-01-01', 25], ['2015-01-02', 10], ['2015-01-03', 10]]

Now build the final dictionary,

>>> dict(new_data)
{'2015-01-03': 10, '2015-01-02': 10, '2015-01-01': 25}

itertools and defaultdict are pretty unnecessary for this. I think that this is simpler and easier to read.

dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]

combined = {}
for (date,value) in zip(dates,values):
  if date in combined:
    combined[date] += value
  else:
    combined[date] = value

Performance analysis

I'm not saying that defaultdict is a bad solution, I was only pointing out that it requires more tacit knowledge to use without pitfalls.

It is not however the fastest solution.

from collections import defaultdict
from datetime import datetime
import timeit

dates = [datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 2, 1, 0, 0),datetime(2014, 3, 1, 0, 0),datetime(2014, 3, 1, 0, 0)]
values = [2,7,4,8,4]

def combine_default_dict(dates=dates,values=values):
  d = defaultdict(int)
  for dte, val in zip(dates, values):
      d[dte] += val
  return d

def combine_setdefault(dates=dates,values=values):
  d = {}
  for dte, val in zip(dates,values):
      d.setdefault(dte,0)
      d[dte] += val
  return d

def combine_get(dates=dates,values=values):
  d = {}
  for dte, val in zip(dates,values):
      d[dte] = d.get(dte, 0) + val
  return d

def combine_contains(dates=dates,values=values):
  d = {}
  for (date,value) in zip(dates,values):
    if date in d:
      d[date] += value
    else:
      d[date] = value
  return d

def time_them(number=100000):
  for func_name in [k for k in sorted(globals().keys()) if k.startswith('combine_')]:
    timer = timeit.Timer("{0}()".format(func_name),"from __main__ import {0}".format(func_name))
    time_taken = timer.timeit(number=number)
    print "{0} - {1}".format(time_taken,func_name)

Yields:

>>> time_them()
0.388070106506 - combine_contains
0.485766887665 - combine_default_dict
0.415601968765 - combine_get
0.472551822662 - combine_setdefault

I've tried it on a couple of different machines and python versions. combine_default_dict competes with combine_setdefault for the slowest. combine_contains has been consistently the fastest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM