From the list of dictionaries find the largest value lengths for each key

Question

data = [{"id": "78ab45",
         "name": "Jonh"},
        {"id": "69cd234457",
         "name": "Joe"}]

I want my function to return the largest value lengths for each key from all dictionaries:

expected_output = [
    { "size": 10, "name": "id" }, #because the length of the largest "id" value is 10
    { "size": 4, "name": "name" }, #because the length of the largest "name" value is 4
]

My code so far:

def my_func(data):
  headers_and_sizes = []
  for item in data:
     for key, value in item.items():
        headers_and_sizes.append({"size": f'{len(value)}', "name": key})
        if int(headers_and_sizes[0]["size"]) < len(value):
            headers_and_sizes[0]["size"] = len(value)
            
  return headers_and_sizes

Gives me this:

[{'size': '6', 'name': 'id'}, {'size': '4', 'name': 'name'}, {'size': '10', 'name': 'id'}, {'size': '3', 'name': 'name'}]

How can I fix that so that it will return the values as in expected_output ?

Answer 1

You'll want to be updating a dictionary that stores each key mapped to the maximum length seen for that key thus far.

data = [
  {
     "id": "78ab45",
     "name": "Jonh",
  },
  {
     "id": "69cd234457",
     "name": "Joe",
  },
]
key_to_max_len = {}
for datum in data:
   for key, val in datum.items():
        if key not in key_to_max_len or len(val) > key_to_max_len[key]:
            key_to_max_len[key] = len(val)
key_size_arr = [{"size": val, "name": key} for key, val in key_to_max_len.items()]

Answer 2

you can get the max value for id and name like below code, and structure the output accordingly

>>> data 
[{'id': '78ab45', 'name': 'Jonh'}, {'id': '69cd234457', 'name': 'Joe'}]
id  = max(map(lambda x:len(x['id']), data))
name  = max(map(lambda x:len(x['name']), data))
>>> id
10
>>> name
4

Answer 3

You can use list comprehension to form a tuple with ids and names:

names_ids = [(eachdict['id'],eachdict['name']) for eachdict in data]

Format the output to have the desired shape (dictionaries), find the max length (using the max() function, passing it the lengths of name s and id s, using another list comprehension, inside max() ):

expected_output = \
[{"size":max([len(each[0]) for each in names_ids]),"name":"id"},
 {"size":max([len(each[1]) for each in names_ids]),"name":"name"}]

Output will be:

[{'name': 'id', 'size': 10}, {'name': 'name', 'size': 4}]

Answer 4

Using the following:

keys = list(data[0].keys())
output = {key:-1 for key in keys}
for d in data:
    for k in d.keys():
        if len(d[k]) > output[k]:
            output[k] = len(d[k])

Will output:

{'id': 10, 'name': 4}

Answer 5

I think the easiest method here is pandas...

import pandas as pd
df = pd.DataFrame(data)

out = [{'size': df['id'].str.len().max(), 'name':'id'},
       {'size': df['name'].str.len().max(), 'name':'name'}]

output:

[{'size': 10, 'name': 'id'}, {'size': 4, 'name': 'name'}]

or for addt'l names..

[{'size':df[col].str.len().max(), 'name':col} for col in df.columns]

Answer 6

Here is how you can use a nested dictionary comprehension:

data = [{"id": "78ab45",
         "name": "Jonh"},
        {"id": "69cd234457",
         "name": "Joe"}]


expected_output = [{'size': len(max([i[k] for i in data], key=len)),
                    'name': k} for k in data[0]]

print(expected_output)

Output:

[{'size': 10, 'name': 'id'},
 {'size': 4, 'name': 'name'}]

From the list of dictionaries find the largest value lengths for each key

Question

6 answers

solution1
1 ACCPTED 2020-07-22 17:14:24

solution2
1 2020-07-22 17:16:20

solution3
1 2020-07-22 17:23:30

solution4
0 2020-07-22 17:26:36

solution5
0 2020-07-22 17:35:53

solution6
0 2020-07-24 19:29:48

From the list of dictionaries find the largest value lengths for each key

Question

6 answers

solution1 1 ACCPTED 2020-07-22 17:14:24

solution2 1 2020-07-22 17:16:20

solution3 1 2020-07-22 17:23:30

solution4 0 2020-07-22 17:26:36

solution5 0 2020-07-22 17:35:53

solution6 0 2020-07-24 19:29:48

solution1
1 ACCPTED 2020-07-22 17:14:24

solution2
1 2020-07-22 17:16:20

solution3
1 2020-07-22 17:23:30

solution4
0 2020-07-22 17:26:36

solution5
0 2020-07-22 17:35:53

solution6
0 2020-07-24 19:29:48