简体   繁体   中英

Pandas - Dataframe has column with lists. How can I groupby the elements within the list?

I have a dataframe where some cells contain lists of multiple values, like so:

import pandas as pd

df = pd.DataFrame(
    {'category': [[x,y,z],[x],[y,z],[x,z]]
     'value': [20,30,20,10]
    }
)

df

Out[10]: 
     category  value
0    [x, y, z] 20
1    [x]       30
2    [y, z]    20
3    [x, z]    10

I'd like to group the data by unique elements in the category column and capture both the count of each element and mean of the value in which the element is present.

Intended output should look like:

     count  mean
x    3      20
y    2      20
z    3      16.7

I'm relatively familiar with simple groupby functions, and am able to create a flat list of unique elements (ie [x,y,z]). However, I'm not sure how to use that flat list to transform the data as desired above. Help much appreciated!

Use ( explode for pandas 0.25+):

df.explode('category').groupby('category')['value'].agg(['count','mean'])

          count       mean
category                  
x             3  20.000000
y             2  20.000000
z             3  16.666667

For pandas version below 0.25 :

(df.loc[df.index.repeat(df['category'].str.len()),['value']]
  .assign(category=np.concatenate(df['category']))
 .groupby('category')['value'].agg(['count','mean']))

          count       mean
category                  
x             3  20.000000
y             2  20.000000
z             3  16.666667

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM