简体   繁体   中英

python: flatten list while preserving nested structure for certain indexes

I found several posts about flattening/collapsing lists in Python, but none which cover this case:

Input:

[a_key_1, a_key_2, a_value_1, a_value_2]
[b_key_1, b_key_2, b_value_1, b_value_2]
[a_key_1, a_key_2 a_value_3, a_value_4]
[a_key_1, a_key_3, a_value_5, a_value_6]

Output:

[a_key_1, a_key_2, [a_value1, a_value3], [a_value_2, a_value_4]]
[b_key_1, b_key_2, [b_value1], [b_value_2]]
[a_key_1, a_key_3, [a_value_5], [a_value_6]]

I want to flatten the lists so there is only one entry per unique set of keys and the remaining values are combined into nested lists next to those unique keys.

EDIT: The first two elements in the input will always be the keys; the last two elements will always be the values.

Is this possible?

Yes, it's possible. Here's a function (with doctest from your input/output) that performs the task:

#!/usr/bin/env python
"""Flatten lists as per http://stackoverflow.com/q/30387083/253599."""

from collections import OrderedDict


def flatten(key_length, *args):
    """
    Take lists having key elements and collect remainder into result.

    >>> flatten(1,
    ...         ['A', 'a1', 'a2'],
    ...         ['B', 'b1', 'b2'],
    ...         ['A', 'a3', 'a4'])
    [['A', ['a1', 'a2'], ['a3', 'a4']], ['B', ['b1', 'b2']]]

    >>> flatten(2,
    ...         ['A1', 'A2', 'a1', 'a2'],
    ...         ['B1', 'B2', 'b1', 'b2'],
    ...         ['A1', 'A2', 'a3', 'a4'],
    ...         ['A1', 'A3', 'a5', 'a6'])
    [['A1', 'A2', ['a1', 'a2'], ['a3', 'a4']], ['B1', 'B2', ['b1', 'b2']], ['A1', 'A3', ['a5', 'a6']]]
    """
    result = OrderedDict()
    for vals in args:
        result.setdefault(
            tuple(vals[:key_length]), [],
        ).append(vals[key_length:])
    return [
        list(key) + list(vals)
        for key, vals
        in result.items()
    ]


if __name__ == '__main__':
    import doctest
    doctest.testmod()

(Edited to work with both your original question and the edited question)

data = [
    ["a_key_1", "a_key_2", "a_value_1", "a_value_2"],
    ["b_key_1", "b_key_2", "b_value_1", "b_value_2"],
    ["a_key_1", "a_key_2", "a_value_3", "a_value_4"],
    ["a_key_1", "a_key_3", "a_value_5", "a_value_6"],
]

from itertools import groupby
keyfunc = lambda row: (row[0], row[1])
print [
    list(key) + [list(zipped) for zipped in zip(*group)[2:]]
    for key, group
    in groupby(sorted(data, key=keyfunc), keyfunc)
]


# => [['a_key_1', 'a_key_2', ['a_value_1', 'a_value_3'], ['a_value_2', 'a_value_4']],
#     ['a_key_1', 'a_key_3', ['a_value_5'], ['a_value_6']],
#     ['b_key_1', 'b_key_2', ['b_value_1'], ['b_value_2']]]

For more information check the Python Docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM