如何计算每组的非空列？

Question

I started from raw data that looks like this:我从看起来像这样的原始数据开始：

  Case  Final    Pre    Post   
  1     A        Z      X  
        Z               V  
  2     B                  
        Y                  
  3     A        Z      Y  
        Z               U  
        W                  
  4     C        W        
        Z                  
  5     C        X      Z  
        X                  
        Z

then did a forward fill:然后做了一个前向填充：

df['Case'] = df['Case'].ffill()

like so:像这样：

  Case  Final    Pre    Post   
  1     A        Z      X  
  1     Z        NaN    V  
  2     B        NaN    NaN
  2     Y        NaN    NaN
  3     A        Z      Y  
  3     Z        NaN    U  
  3     W        NaN    NaN
  4     C        W      NaN
  4     Z        NaN    NaN
  5     C        X      Z  
  5     X        NaN    NaN
  5     Z        NaN    NaN

What I want is to count the number of cases in each column where the column is not null:我想要的是计算列不为空的每列中的案例数：

  Case: 5
  Final: 5
  Pre: 4
  Post: 3

Output Explanation:输出说明：

1- Group by the first column Case . 1- 按第一列Case分组。

2- Even if one value of the column is not null (including Case column itself) then count++ (increment the not null count by 1). 2- 即使该列的一个值不为空（包括Case列本身），然后 count++ （将非空计数增加 1）。

Answer 1

Use:用：

s = df.notna().groupby(df['Case']).any().sum()
#oldier pandas versions
s = df.notnull().groupby(df['Case']).any().sum()
print (s)
Case     5
Final    5
Pre      4
Post     3
dtype: int64

Details :详情：

First check non missing values by DataFrame.notna :首先通过DataFrame.notna检查非缺失值：

print (df.notna())
    Case  Final    Pre   Post
0   True   True   True   True
1   True   True  False   True
2   True   True  False  False
3   True   True  False  False
4   True   True   True   True
5   True   True  False   True
6   True   True  False  False
7   True   True   True  False
8   True   True  False  False
9   True   True   True   True
10  True   True  False  False
11  True   True  False  False

And then aggregate by column Case with GroupBy.any :然后按列Case与GroupBy.any ：

print (df.notnull().groupby(df['Case']).any())
      Case  Final    Pre   Post
Case                           
1     True   True   True   True
2     True   True  False  False
3     True   True   True   True
4     True   True   True  False
5     True   True   True   True

And last sum values for count True s processes like 1 .以及 count True进程的最后一个sum值，如1 。

Answer 2

How about:怎么样：

grouped = df.groupby('Case', as_index=False)\
            .agg(lambda col: col.notnull().any())\
            .astype(bool)\
            .sum(axis='rows')

We group by 'Case' , and work out if there is any non-null value for each column.我们按'Case'分组，并计算出每列是否有任何非空值。 So所以

df.groupby('Case', as_index=False)\
  .agg(lambda col: col.notnull().any())

Gives us:给我们：

   Case  Final    Pre   Post
0   1.0   True   True   True
1   2.0   True  False  False
2   3.0   True   True   True
3   4.0   True   True  False
4   5.0   True   True   True

Using .astype(bool) sets every value in the 'Case' column to be True as they are non-zero, and then summing with axis='rows' gives us the total of each column (where True becomes 1 and False 0), giving us:使用.astype(bool)将'Case'列中的每个值设置为True因为它们不为零，然后用axis='rows'求和为我们提供每列的总数（其中True变为 1 和False 0），给我们：

Case     5
Final    5
Pre      4
Post     3
dtype: int64

Answer 3

try this:尝试这个：

df.index = df.Case

df.apply(lambda x: len(x[pd.isna(x) == False].index.unique()))

Out:出去：

Case     5
Final    5
Pre      4
Post     3
dtype: int64

Answer 4

IIUC国际大学联盟

df.groupby(df['Case'], as_index=False).any().astype(bool).sum()

output输出

Case     5
Final    5
Pre      4
Post     3

如何计算每组的非空列？

问题描述

4 个解决方案

解决方案1
4 已采纳 2018-10-04 11:45:32

解决方案2
3 2018-10-04 12:06:28

解决方案3
2 2018-10-04 11:54:34

解决方案4
1 2018-10-04 11:53:18

output输出

如何计算每组的非空列？

问题描述

4 个解决方案

解决方案1 4 已采纳 2018-10-04 11:45:32

解决方案2 3 2018-10-04 12:06:28

解决方案3 2 2018-10-04 11:54:34

解决方案4 1 2018-10-04 11:53:18

output输出

解决方案1
4 已采纳 2018-10-04 11:45:32

解决方案2
3 2018-10-04 12:06:28

解决方案3
2 2018-10-04 11:54:34

解决方案4
1 2018-10-04 11:53:18