简体   繁体   中英

Create list for each unique value

I'm currently looking at a table with the following structure.

uid | action
 1  |   A1
 1  |   A1
 1  |   A1
 1  |   A4
 2  |   A1
 2  |   A8
 2  |   A9
 3  |   A3
 3  |   A7

I'm trying to create a multidimensional array with the following structure.

[[A1, A1, A1, A4], [A1, A8, A9], [A3, A7]] 

My idea is to keep track of a uid and append the actions to a list till the uid key changes. Once the uid key does change, all the actions will be appended to another array and the tracked uid will change to the new uid .

I've come up with a somewhat overblown and incorrect solution using itertools.groupby() but I'm not satisfied with it and am looking for something simpler. However, I've overthought this problem and am coming up with more complicated solutions.

Any tips would be appreciated.

Code:

data = []
for i, j in itertools.groupby(table, key=lambda x: x['uid']):
    event_array = []
    for k in list(j):
        event_array.append(k['action'])
    data.append([i, event_array])

As per OP's comment ,

@Black Are you sure that the data is ordered?

... @thefourtheye, yes pretty sure as I've had to write it in sql before reading it into python

Since the data is already ordered, for example, like this

>>> data = [{'action': 'A1', 'uid': 1},
...  {'action': 'A1', 'uid': 1},
...  {'action': 'A1', 'uid': 1},
...  {'action': 'A4', 'uid': 1},
...  {'action': 'A1', 'uid': 2},
...  {'action': 'A8', 'uid': 2},
...  {'action': 'A9', 'uid': 2},
...  {'action': 'A3', 'uid': 3},
...  {'action': 'A7', 'uid': 3}]

you can simply use groupby itself, with a nested list comprehension, like this

>>> [[k['action'] for k in j] for i, j in groupby(data, key=lambda x: x['uid'])]
[['A1', 'A1', 'A1', 'A4'], ['A1', 'A8', 'A9'], ['A3', 'A7']]

You can use good old defaultdict :

from collections import defaultdict

DATA = [{'uid': uid, 'action': action}
        for uid, action in [(1, 'A1'),
                            (1, 'A1'),
                            (1, 'A1'),
                            (1, 'A4'),
                            (2, 'A1'),
                            (2, 'A8'),
                            (2, 'A9'),
                            (3, 'A3'),
                            (3, 'A7'),]]

d = defaultdict(list)

for data in DATA:
    d[data['uid']].append(data['action'])

print(d.values())

Result will be:

[['A1', 'A1', 'A1', 'A4'], ['A1', 'A8', 'A9'], ['A3', 'A7']]

This should work, but it seems like groupby is already perfectly good.

uids = {}
for row in table:
    uids.setdefault(row['uid'], []).append(row['action'])

data = [uids[uid] for uid in sorted(uids.keys())]

The solution simply iterates over each row in the table , and makes sure that there is a list for the corresponding uid in the uids dict (using setdefault ). Then it appends the action for that row onto the list.

So uids will be a dictionary whose keys are the UIDs, and values are sequences of corresponding actions from the table.

If you really want a list of lists (a "multidimensional array"), the last line uses a list comprehension to build a list whose elements are the lists of actions stored in the uids dict, ordered by uid.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM