簡體   English   中英

在python中使用多個鍵左合並字典列表,用零填充

[英]left merging list of dictionaries in python with multiple keys, filling in with zeros

我試圖在不使用熊貓的情況下從根本上替換腳本中的 pd.merge() 功能。

如果我有 2 個字典列表(如下):

l1 = [{'key1': '2017', 'key2': '20-30', 'val1': 11},
  {'key1': '2017', 'key2': '30-40', 'val1': 22},
  {'key1': '2017', 'key2': '40-50', 'val1': 33},
  {'key1': '2017', 'key2': '50+', 'val1': 44},
  {'key1': '2018', 'key2': '20-30', 'val1': 55},
  {'key1': '2018', 'key2': '30-40', 'val1': 66},
  {'key1': '2018', 'key2': '40-50', 'val1': 77},
  {'key1': '2018', 'key2': '50+', 'val1': 88}]


l2 = [{'key1': '2017', 'key2': '20-30', 'val2': 1000},
      {'key1': '2017', 'key2': '40-50', 'val3': 2000},
      {'key1': '2018', 'key2': '50+', 'val3': 3000}]

我想使用多個鍵“左合並”以呈現以下結果:

output = [{'key1': '2017', 'key2': '20-30', 'val1': 11, 'val2':1000, 'val3'0:},
      {'key1': '2017', 'key2': '30-40', 'val1': 22, 'val2':0, 'val3':0},
      {'key1': '2017', 'key2': '40-50', 'val1': 33, 'val2':0, 'val3':2000},
      {'key1': '2017', 'key2': '50+', 'val1': 44, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '20-30', 'val1': 55, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '30-40', 'val1': 66, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '40-50', 'val1': 77, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '50+', 'val1': 88, 'val2':0, 'val3':3000}]

我得到的最接近的是使用它作為參考和下面的代碼,但我不確定如何讓它完全正確(包括零)。

l1 = {(d['key1'], d['key2']):d for d in l1}
all = [dict(d, **l1.get((d['key1'], d['key2']), {})) for d in l2]

在處理 pandas 數據幀時,pandas 通常會提前知道列數和數據類型。

假設每個列表中的所有元素具有相同的結構( l1中的鍵可能與l2不同,但l1中的所有元素具有相同的鍵,並且l2中的所有元素具有相同的鍵,然后發現默認類型以及鍵的總數在每個輸出字典中都變成O(1)操作。現在,鑒於l2有不同的鍵,您將不得不掃描列表,即O(n)操作,以找出l2中的列/鍵的總數。

參考下面代碼中的left_merge函數。 它更冗長,但解釋了發生了什么。

l1 = [{'key1': '2017', 'key2': '20-30', 'val1': 11},
  {'key1': '2017', 'key2': '30-40', 'val1': 22},
  {'key1': '2017', 'key2': '40-50', 'val1': 33},
  {'key1': '2017', 'key2': '50+', 'val1': 44},
  {'key1': '2018', 'key2': '20-30', 'val1': 55},
  {'key1': '2018', 'key2': '30-40', 'val1': 66},
  {'key1': '2018', 'key2': '40-50', 'val1': 77},
  {'key1': '2018', 'key2': '50+', 'val1': 88}]


l2 = [{'key1': '2017', 'key2': '20-30', 'val2': 1000},
      {'key1': '2017', 'key2': '40-50', 'val3': 2000},
      {'key1': '2018', 'key2': '50+', 'val3': 3000}]

op_output = [{'key1': '2017', 'key2': '20-30', 'val1': 11, 'val2':1000, 'val3': 0},
      {'key1': '2017', 'key2': '30-40', 'val1': 22, 'val2':0, 'val3':0},
      {'key1': '2017', 'key2': '40-50', 'val1': 33, 'val2':0, 'val3':2000},
      {'key1': '2017', 'key2': '50+', 'val1': 44, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '20-30', 'val1': 55, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '30-40', 'val1': 66, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '40-50', 'val1': 77, 'val2':0, 'val3':0},
      {'key1': '2018', 'key2': '50+', 'val1': 88, 'val2':0, 'val3':3000}]


def left_merge(a, b, key):
    # a and b are list of dictionaries
    # key is a callable
    # TODO: bounds checking if a is empty or b is empty

    b_index = {key(i): i for i in b}
    output = []

    # pick one element from a and b so we know the final columns
    merged_item_columns = set()
    merged_item_columns.update(a[0].keys())
    merged_item_columns.update(b[0].keys())

    # UPDATE: Above assumption of picking one element from list a and b
    # does not hold true
    # In OP's question: l2 has some records with val2, some with val3.
    # So it isn't like a dataframe where all columns are known in advance.

    # Discovery requires scanning all elements (sigh)
    # This can be done when creating the index for b atleast.

    b_index = {} # replaces the original b_index computation at the beginning.
    # NOTE: if l1 also has similar characteristics, it will also require a similar scan.
    for i in b:
        b_index[key(i)] = i
        merged_item_columns.update(i.keys())

    # TODO: determine type for each column and choose correct defaults
    # using 0 as default for now.
    merged_item_template = {k:0 for k in merged_item_columns}
    for a_item in a:
        merged_item = merged_item_template.copy()
        merged_item.update(a_item)

        b_item = b_index.get(key(a_item))
        if b_item is not None:
            merged_item.update(b_item)

        output.append(merged_item)

    return output


output = left_merge(l1, l2, key=lambda x:(x['key1'], x['key2']))
print(output)
print(op_output == output)

您可以首先為val2val3分配零並應用您獲得的代碼:

l2 = {(d['key1'], d['key2']): d for d in l2}

output = [{**d, **{'val2': 0, 'val3': 0}} for d in l1] # zeros for val2 and val3
output = [{**d, **l2.get((d['key1'], d['key2']), {})} for d in output] # update

輸出:

[{'key1': '2017', 'key2': '20-30', 'val1': 11, 'val2': 1000, 'val3': 0},
 {'key1': '2017', 'key2': '30-40', 'val1': 22, 'val2': 0, 'val3': 0},
 {'key1': '2017', 'key2': '40-50', 'val1': 33, 'val2': 0, 'val3': 2000},
 {'key1': '2017', 'key2': '50+', 'val1': 44, 'val2': 0, 'val3': 0},
 {'key1': '2018', 'key2': '20-30', 'val1': 55, 'val2': 0, 'val3': 0},
 {'key1': '2018', 'key2': '30-40', 'val1': 66, 'val2': 0, 'val3': 0},
 {'key1': '2018', 'key2': '40-50', 'val1': 77, 'val2': 0, 'val3': 0},
 {'key1': '2018', 'key2': '50+', 'val1': 88, 'val2': 0, 'val3': 3000}]

順便說一句,對於 python 3.9+,您可以改用| 運算符來簡化代碼:

l2 = {(d['key1'], d['key2']): d for d in l2}

output = [d | {'val2': 0, 'val3': 0} for d in l1]
output = [d | l2.get((d['key1'], d['key2']), {}) for d in output]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM