简体   繁体   中英

From a list of dicts get the maximal length of the values for each key in a pythonic way

I'm looking for a more pythonic way to get the maximal length of the values for each key in a list of dictionaries.

My approach looks like this

lst =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {}
for l in lst:
    for key in l:
        dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
print(dct)

The output gives

{'b': 6, 'a': 11}

The str function is needed to get the length of integers (and also Nones)

Is this approach "pythonic" or is there a smoother, more readable way using list comprehensions or similar methods.

I think your approach is fairly Pythonic except that I would change the update line to be a little more clear:

# A little terse
dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
# A little simpler
dct[key] = max(dct.get(key, 0), len(str(l[key])))

Here's a solution with variable names modified as well:

dict_list =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
max_lengths = {}
for dictionary in dict_list:
    for k, v in dictionary.items():
        max_lengths[k] = max(max_lengths.get(k, 0), len(str(v)))
print(max_lengths)

My previous answer was wrong and did not realize but here are two others that do work. The first one uses pandas. It creates a dataframe, sorts the keys then the values, takes the first value of each group, and then creates a dictionary out of that

import pandas as pd
lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}

d = pd.DataFrame([(k,len(str(v))) for i in lst for k,v in i.items()], columns=['Key','Value'])
d = d.sort(['Key','Value'], ascending=[1,0])
d = d.groupby('Key').first().reset_index()
d = dict(zip(d.Key, d.Value))  #or d.set_index('Key')['Value'].to_dict()
print d

{'a': 11, 'b': 6}

if you want something that is easily readable and uses the built-in modules then this should do

lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}

for i in lst:
    for k,v in i.items():
        if k in dct:
            if len(str(v)) > dct[k]:
                dct[k] = len(str(v))
        else:
            dct[k] = len(str(v))
print dct

{'a': 11, 'b': 6}

Here's another way that doesn't rely on sorting/zipping but I wouldn't say one is more Pythonic than the other.

from itertools import chain

lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
    k: max(len(str(d.get(k, ""))) for d in lst)
    for k in set(chain.from_iterable(d.keys() for d in lst))
}

print(dct)

Alternatively, you can use groupby:

from itertools import chain, groupby

lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
    k: max(len(str(v)) for _, v in g)
    for k, g in groupby(
        chain.from_iterable(d.items() for d in lst),
        lambda p: p[0]
    )
}

print(dct)

The other answers focus on using python features rather than readability. Personally I'm of the opinion that readability and simplicity are the most important of all the 'pythonic' traits.

(I simplified to use strings for everything, but it would work with integers as well if you drop in a str() )

from collections import defaultdict

lst =[{'a':'asdasd', 'b': '123'},{'b': 'asdasdasdas'}, {'a':'123','b':'asdasd'}]

def merge_dict(dic1,dic2) :
    for key,value in dic2.items():
            dic1[key].append(value)

combined = defaultdict(list)
for dic in lst:
    merge_dict(combined, dic)

print( {key : max(map(len,value)) for key, value in combined.items() } )

I like this take for readability and use of Python as such:

dicts = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]

def get_highest(current_highest, items_left):
    if not items_left:
        return current_highest
    else:
        item = items_left.pop()
        higher = {key: len(str(value)) for key, value in item.items() if (len(str(item[key])) > current_highest.get(key, 0))}
    if higher:
        current_highest.update(higher)
    return get_highest(current_highest, items_left)

print(get_highest(dict(), dicts))

{'b': 6, 'a': 11}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM