简体   繁体   中英

Sort multiple dictionaries identically, based on a specific order defined by a list

I had a special case where multiple existing dictionaries had to be sorted based on the exact order of items in a list (not alphabetical). So for example the dictionaries were:

dict_one = {"LastName": "Bar", "FirstName": "Foo", "Address": "Example Street 101", "Phone": "012345678"}
dict_two = {"Phone": "001122334455", "LastName": "Spammer", "FirstName": "Egg", "Address": "SSStreet 123"}
dict_three = {"Address": "Run Down Street 66", "Phone": "0987654321", "LastName": "Biker", "FirstName": "Random"}

And the list was:

data_order = ["FirstName", "LastName", "Phone", "Address"]

With the expected result being the ability to create a file like this:

FirstName;LastName;Phone;Address
Foo;Bar;012345678;Example Street 101
Egg;Spammer;001122334455;SSStreet 123
Random;Biker;0987654321;Run Down Street 66

Note : In my case, the real use was an Excel file using pyexcel-xls, but the CSV-like example above is probably closer to what is usually done, so the answers might be more universally applicable for CSV than Excel.

I had a bit of hard time to find any good answers in Stack Overflow for this case, but eventually I got the sorting working, which I could use to create the file. The header row can simply be taken directly from the data_order list below. Here's how I did it - hope it helps someone:

from collections import OrderedDict
import pprint

dict_one = {
    "LastName": "Bar", 
    "FirstName": "Foo", 
    "Address": "Example Street 101", 
    "Phone": "012345678"}
dict_two = {
    "Phone": "001122334455", 
    "LastName": "Spammer", 
    "FirstName": "Egg", 
    "Address": "SSStreet 123"}  
dict_three = {
    "Address": "Run Down Street 66", 
    "Phone": "0987654321", 
    "LastName": "Biker", 
    "FirstName": "Random"}

dict_list = []
dict_list.append(dict_one)
dict_list.append(dict_two)
dict_list.append(dict_three)

data_order = ["FirstName", "LastName", "Phone", "Address"]

result = []
for dictionary in dict_list:
    result_dict = OrderedDict()
    # Go through the data_order in order
    for key in data_order:
        # Populate result_dict in the list order
        result_dict[key] = dictionary[key]
    result.append(result_dict)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(result)
"""
[   {   'FirstName': 'Foo',
        'LastName': 'Bar',
        'Phone': '012345678',
        'Address': 'Example Street 101'},
    {   'FirstName': 'Egg',
        'LastName': 'Spammer',
        'Phone': '001122334455',
        'Address': 'SSStreet 123'},
    {   'FirstName': 'Random',
        'LastName': 'Biker',
        'Phone': '0987654321',
        'Address': 'Run Down Street 66'}]
"""

This can be achieved in a one liner, although it is harder to read. In case it is useful for someone:

print [OrderedDict([(key, d[key]) for key in data_order]) for d in [dict_one, dict_two, dict_three]]

This is a classic use case for csv.DictWriter , because your expected output is CSV-like (semi-colon delimiters instead of commas is supported) which would handle all of this for you, avoiding the need for ridiculous workaround involving OrderedDict , and making it easy to read the data back in without worrying about corner cases ( csv automatically quotes fields if necessary, and parses quoted fields on read in as needed):

with open('outputfile.txt', 'w', newline='') as f:
    csvout = csv.DictWriter(f, data_order, delimiter=';')

    # Write the header
    csvout.writeheader()
    csvout.writerow(dict_one)
    csvout.writerow(dict_two)
    csvout.writerow(dict_three)

That's it, csv handles ordering, (it knows the correct order from the data_order passed as fieldnames to the DictWriter constructor), formatting, etc.


If you had some need to pull the values in a specific order from many dict s without writing them (since your use case doesn't even use the keys), operator.itemgetter can be used to simplify this dramatically:

from operator import itemgetter

getfields = itemgetter(*data_order)

dict_one_fields = getfields(dict_one)

which leaves dict_one_fields as a tuple with the requested fields in the requested order, ('Foo', 'Bar', '012345678', 'Example Street 101') , and runs significantly faster than repeatedly indexing at the Python layer ( itemgetter creates a C level "functor" that can retrieve all the requested values in a single call, with no Python level byte code at all for built-in keys like str ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM