I have a dataframe where some cells contain lists of multiple values, like so:
import pandas as pd
df = pd.DataFrame(
{'category': [[x,y,z],[x],[y,z],[x,z]]
'value': [20,30,20,10]
}
)
df
Out[10]:
category value
0 [x, y, z] 20
1 [x] 30
2 [y, z] 20
3 [x, z] 10
I'd like to group the data by unique elements in the category
column and capture both the count of each element and mean of the value
in which the element is present.
Intended output should look like:
count mean
x 3 20
y 2 20
z 3 16.7
I'm relatively familiar with simple groupby functions, and am able to create a flat list of unique elements (ie [x,y,z]). However, I'm not sure how to use that flat list to transform the data as desired above. Help much appreciated!
Use ( explode
for pandas 0.25+):
df.explode('category').groupby('category')['value'].agg(['count','mean'])
count mean
category
x 3 20.000000
y 2 20.000000
z 3 16.666667
For pandas version below 0.25
:
(df.loc[df.index.repeat(df['category'].str.len()),['value']]
.assign(category=np.concatenate(df['category']))
.groupby('category')['value'].agg(['count','mean']))
count mean
category
x 3 20.000000
y 2 20.000000
z 3 16.666667
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.