在合並Python中的重復項時對字典列表進行排序？

Question

所以我有這樣一個字典列表：

data = [ { 
           'Organization' : '123 Solar',
           'Phone' : '444-444-4444',
           'Email' : '',
           'website' : 'www.123solar.com'
         }, {
           'Organization' : '123 Solar',
           'Phone' : '',
           'Email' : 'joey@123solar.com',
           'Website' : 'www.123solar.com'
         }, {
           etc...
         } ]

當然，這不是確切的數據。 但是（也許）從我的示例中可以發現我的問題。 我有許多具有相同“組織”名稱的記錄，但是其中沒有一個具有該記錄的完整信息。

有沒有一種有效的方法來搜索列表，根據字典的第一個條目對列表進行排序，最后合並重復項中的數據以創建唯一的條目？ （請記住，這些詞典很大）

Answer 1

您可以使用itertools.groupby ：

from itertools import groupby
from operator import itemgetter
from pprint import pprint

data = [ {
           'Organization' : '123 Solar',
           'Phone' : '444-444-4444',
           'Email' : '',
           'website' : 'www.123solar.com'
         }, {
           'Organization' : '123 Solar',
           'Phone' : '',
           'Email' : 'joey@123solar.com',
           'Website' : 'www.123solar.com'
         },
         {
           'Organization' : '234 test',
           'Phone' : '111',
           'Email' : 'a@123solar.com',
           'Website' : 'b.123solar.com'
         },
         {
           'Organization' : '234 test',
           'Phone' : '222',
           'Email' : 'ac@123solar.com',
           'Website' : 'bd.123solar.com'
         }]


data = sorted(data, key=itemgetter('Organization'))
result = {}
for key, group in groupby(data, key=itemgetter('Organization')):
    result[key] = [item for item in group]

pprint(result)

印刷品：

{'123 Solar': [{'Email': '',
                'Organization': '123 Solar',
                'Phone': '444-444-4444',
                'website': 'www.123solar.com'},
               {'Email': 'joey@123solar.com',
                'Organization': '123 Solar',
                'Phone': '',
                'Website': 'www.123solar.com'}],
 '234 test': [{'Email': 'a@123solar.com',
               'Organization': '234 test',
               'Phone': '111',
               'Website': 'b.123solar.com'},
              {'Email': 'ac@123solar.com',
               'Organization': '234 test',
               'Phone': '222',
               'Website': 'bd.123solar.com'}]}

UPD：

您可以按照以下步驟將項目分為單個字典：

for key, group in groupby(data, key=itemgetter('Organization')):
    result[key] = {'Phone': [],
                   'Email': [],
                   'Website': []}
    for item in group:
        result[key]['Phone'].append(item['Phone'])
        result[key]['Email'].append(item['Email'])
        result[key]['Website'].append(item['Website'])

然后， result是：

{'123 Solar': {'Email': ['', 'joey@123solar.com'],
               'Phone': ['444-444-4444', ''],
               'Website': ['www.123solar.com', 'www.123solar.com']},
 '234 test': {'Email': ['a@123solar.com', 'ac@123solar.com'],
              'Phone': ['111', '222'],
              'Website': ['b.123solar.com', 'bd.123solar.com']}}

Answer 2

有沒有一種有效的方法來搜索列表，根據字典的第一個條目對列表進行排序，最后合並重復項中的數據以創建唯一的條目？

是的，但是有一種甚至沒有搜索和排序的更有效的方法。 繼續學習時，只需建立字典：

datadict = {}
for thingy in data:
    organization = thingy['Organization']
    datadict[organization] = merge(thingy, datadict.get(organization, {}))

現在，您已經對數據進行了線性傳遞，並對每個數據進行了恆定時間的查找。 因此，它比任何排序的解決方案都要好O（log N）。 這也是一次通過，而不是多次通過，而且它的常量開銷可能會更低。

目前尚不清楚您要合並這些條目的確切方式，並且沒有人可以在不知道要使用什么規則的情況下編寫代碼。 但這是一個簡單的示例：

def merge(d1, d2):
    for key, value in d2.items():
        if not d1.get(key):
            d1[key] = value
    return d1

換句話說，對於d2每個項目，如果d1已經具有真實值（例如非空字符串），則將其保留；否則，將其保留。 否則，添加它。

在合並Python中的重復項時對字典列表進行排序？

問題描述

2 個解決方案

解決方案1
3 已采納 2013-08-27 19:23:33

解決方案2
2 2013-08-27 19:24:26

在合並Python中的重復項時對字典列表進行排序？

問題描述

2 個解決方案

解決方案1 3 已采納 2013-08-27 19:23:33

解決方案2 2 2013-08-27 19:24:26

解決方案1
3 已采納 2013-08-27 19:23:33

解決方案2
2 2013-08-27 19:24:26