g | val
1 a
1 ''
2 b
2 ''
2 c
3 ''
I have a df.groupby('g') and I want to select the median of the count of the non empty vals in each group. How to do that in pandas?
Is this what you need ? (Count will not count the NaN
, this why we replace the '' to np.nan
)
df.val=df.val.replace('',np.nan)
df
Out[243]:
g val
0 1 a
1 1 NaN
2 2 b
3 2 NaN
4 2 c
5 3 NaN
df.groupby('g').val.count().median()
Out[245]: 1.0
Filter before groupby
df[df.val.isin(['','somethingealse'])].groupby('g').val.count().median()
Another way is by using apply
function:
# inside apply, we can filter values
df.groupby('g')['val'].apply(lambda x: x[x!= ''].count()).median()
Out[2]: 1.0
您可以只对val
列中的空值进行切片,然后使用groupby
并计算中位数。
df[df['val']!=''].groupby('g').val.count().median()
Empty strings evaluate to False
in a boolean context. And False
evaluates to 0
in an integer context. We can use this to do
df.val.astype(bool).groupby(df.g).sum().median()
1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.