简体   繁体   中英

Create list of lists from Groupedby dataframe in Pandas

Assume my df is as follows:

df = pd.DataFrame({'Order ID': [1, 1, 2, 3, 3, 3, 4, 4], 'Product': ['USB', 'Bat', 'Ball', 'USB', 'Phone', 'Toy', 'Bike', 'Apple']})

I want to group by Order ID, and then put the Product values in a list of lists depending on the frequency of their Order ID. For example -

[['USB', 'Bat'], ['Ball'], ['USB', 'Phone', 'Toy'], ['Bike', 'Apple']]

USB and Bat both have the same order ID (1), so they are in the same list.

My code, trial 1:

combo_outer = []  # outer most list

grouped = df.groupby(['Order ID', 'Product'])

for group, frame in grouped:
    
    combo_inner = []    # for inner lists

    for row_index, row in frame.iterrows():
    
        combo_inner.append(row['Product'])
    
    combo_outer.extend(combo_inner)

Trial 2:

df['Product'].values.tolist()

In both cases, I end up getting a single list:

['Bat', 'USB', 'Ball', 'Phone', 'Toy', 'USB', 'Apple', 'Bike']

What am I doing wrong?

In Trial 1, instead of using extend , you should use append . Or you can use list comprehension:

[g.values.tolist() for _, g in df.Product.groupby(df['Order ID'])]
# [['USB', 'Bat'], ['Ball'], ['USB', 'Phone', 'Toy'], ['Bike', 'Apple']]

In your first trial just do this: grouped = df.groupby(['Order ID']) instead of grouped = df.groupby(['Order ID', 'Product'])

The second thing is, use append in the last line instead of extend

Trail one should be something like this:

combo_outer = []  # outer most list

grouped = df.groupby(['Order ID'])

for group, frame in grouped:
    
    combo_inner = []    # for inner lists

    for row_index, row in frame.iterrows():
    
        combo_inner.append(row['Product'])
    
    combo_outer.append(combo_inner)

You can send the list function directly to the groupby using agg and convert to a list.

df.Product.groupby(df['Order ID']).agg(list).tolist()

Outputs:

[['USB', 'Bat'], ['Ball'], ['USB', 'Phone', 'Toy'], ['Bike', 'Apple']]

In Trial 1, instead of using extend , you should use append . Or you can use apply function as shown below

df1['new']=df.groupby('Order ID')['Product'].apply(list)
df1['new'].values.tolist()

OUTPUT

[['USB', 'Bat'], ['Ball'], ['USB', 'Phone', 'Toy'], ['Bike', 'Apple']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM