[英]How to count non-null columns per group?
I started from raw data that looks like this:我从看起来像这样的原始数据开始:
Case Final Pre Post
1 A Z X
Z V
2 B
Y
3 A Z Y
Z U
W
4 C W
Z
5 C X Z
X
Z
then did a forward fill:然后做了一个前向填充:
df['Case'] = df['Case'].ffill()
like so:像这样:
Case Final Pre Post
1 A Z X
1 Z NaN V
2 B NaN NaN
2 Y NaN NaN
3 A Z Y
3 Z NaN U
3 W NaN NaN
4 C W NaN
4 Z NaN NaN
5 C X Z
5 X NaN NaN
5 Z NaN NaN
What I want is to count the number of cases in each column where the column is not null:我想要的是计算列不为空的每列中的案例数:
Case: 5
Final: 5
Pre: 4
Post: 3
Output Explanation:输出说明:
1- Group by the first column Case
. 1- 按第一列
Case
分组。
2- Even if one value of the column is not null (including Case
column itself) then count++ (increment the not null count by 1). 2- 即使该列的一个值不为空(包括
Case
列本身),然后 count++ (将非空计数增加 1)。
Use:用:
s = df.notna().groupby(df['Case']).any().sum()
#oldier pandas versions
s = df.notnull().groupby(df['Case']).any().sum()
print (s)
Case 5
Final 5
Pre 4
Post 3
dtype: int64
Details :详情:
First check non missing values by DataFrame.notna
:首先通过
DataFrame.notna
检查非缺失值:
print (df.notna())
Case Final Pre Post
0 True True True True
1 True True False True
2 True True False False
3 True True False False
4 True True True True
5 True True False True
6 True True False False
7 True True True False
8 True True False False
9 True True True True
10 True True False False
11 True True False False
And then aggregate by column Case
with GroupBy.any
:然后按列
Case
与GroupBy.any
:
print (df.notnull().groupby(df['Case']).any())
Case Final Pre Post
Case
1 True True True True
2 True True False False
3 True True True True
4 True True True False
5 True True True True
And last sum
values for count True
s processes like 1
.以及 count
True
进程的最后一个sum
值,如1
。
How about:怎么样:
grouped = df.groupby('Case', as_index=False)\
.agg(lambda col: col.notnull().any())\
.astype(bool)\
.sum(axis='rows')
We group by 'Case'
, and work out if there is any non-null value for each column.我们按
'Case'
分组,并计算出每列是否有任何非空值。 So所以
df.groupby('Case', as_index=False)\
.agg(lambda col: col.notnull().any())
Gives us:给我们:
Case Final Pre Post
0 1.0 True True True
1 2.0 True False False
2 3.0 True True True
3 4.0 True True False
4 5.0 True True True
Using .astype(bool)
sets every value in the 'Case'
column to be True
as they are non-zero, and then summing with axis='rows'
gives us the total of each column (where True
becomes 1 and False
0), giving us:使用
.astype(bool)
将'Case'
列中的每个值设置为True
因为它们不为零,然后用axis='rows'
求和为我们提供每列的总数(其中True
变为 1 和False
0) ,给我们:
Case 5
Final 5
Pre 4
Post 3
dtype: int64
try this:尝试这个:
df.index = df.Case
df.apply(lambda x: len(x[pd.isna(x) == False].index.unique()))
Out:出去:
Case 5
Final 5
Pre 4
Post 3
dtype: int64
IIUC国际大学联盟
df.groupby(df['Case'], as_index=False).any().astype(bool).sum()
Case 5
Final 5
Pre 4
Post 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.