简体   繁体   中英

how to get unique values from list column by group by user_id in pandas

input:-

    print(df)
device_id           ids
025c08d535a074b4    [8972]
025c08d535a074b4    [10595, 10595]
02612734f96edc43    [10016, 8795, 10019, 8791, 8351, 8791]
02612734f96edc43    [10016, 8795, 10019, 8791, 8351, 10052, 8345]

should output be unique list of ids for each device_d like :

device_id           ids
025c08d535a074b4    [8972,10595]
02612734f96edc43    [10016, 8795, 10019,8791,8351,10052, 8345]

I try this by using :-->

    df=pd.DataFrame(df.groupby('device_id')['ids'].apply(set))

but it not work properly it add ' for before ids and return list like.

device_id           ids
025c08d535a074b4    [8972,'10595, 10595]
02612734f96edc43    ['10016,8795,10019,8791,8351,8791,'10016]

Use numpy.hstack and numpy.unique :

import numpy as np

df.groupby('device_id')['ids'].apply(lambda x: np.unique(np.hstack(x)))

or if maintaining order is important, use pandas.Series constructor with drop_duplicates :

df.groupby('device_id')['ids'].apply(lambda x: pd.Series(np.hstack(x)).drop_duplicates().to_list())

[out]

device_id
025c08d535a074b4                                    [8972, 10595]
02612734f96edc43    [10016, 8795, 10019, 8791, 8351, 10052, 8345]

If you need output as a DataFrame , just chain on .reset_index :

df.groupby('device_id')['ids'].apply(lambda x: np.unique(np.hstack(x))).reset_index()

[out]

          device_id                                            ids
0  025c08d535a074b4                                  [8972, 10595]
1  02612734f96edc43  [8345, 8351, 8791, 8795, 10016, 10019, 10052]

Try using:

>>> grouped = df.groupby('device_id', as_index=False).sum()
>>> grouped['ids'] = grouped['ids'].apply(lambda x: sorted(set(x), key=x.index))
>>> grouped
          device_id                                            ids
0  025c08d535a074b4                                  [8972, 10595]
1  02612734f96edc43  [10016, 8795, 10019, 8791, 8351, 10052, 8345]
>>> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM