For each distinct value in a given column, count the null and non-null values in another column

Question

Suppose I have the following dataframe:

df = pd.DataFrame({'col1':['x','y','z','x','x','x','y','z','y','y'],
                'col2':[np.nan,'n1',np.nan,np.nan,'n3','n2','n5',np.nan,np.nan,np.nan]})

for each distinct element in col1 I want to count how may null and non-null value are there in col2 and summarise the result in a new dataframe. So far I used df1 = df[df['col1']=='x'] and then

print(df1[df1['col2'].isna()].shape[0],
df1[df1['col2'].notna()].shape[0])

I was then manually changing the value in df1 so that df1 = df[df['col1']=='y'] and df1 = df[df['col1']=='z'] . Yet my method is not efficient at all. The table I desire should look like the following:

  col1  value  no value
0    x      2         2
1    y      2         2
2    z      0         2

I have also tried df.groupby('col1').col2.nunique() yet that only gives me result when there is non-null value.

Answer 1

Let us try crosstab to create a frequency table where the index is the unique values in column col1 and columns represent the corresponding counts of non-nan and nan values in col2 :

out = pd.crosstab(df['col1'], df['col2'].isna())
out.columns = ['value', 'no value']

>>> out

      value  no value
col1                 
x         2         2
y         2         2
z         0         2

Answer 2

Use SeriesGroupBy.value_counts with SeriesGroupBy.value_counts for counts with reshape by Series.unstack and some data cleaning:

df = (df['col2'].isna()
                .groupby(df['col1'])
                .value_counts()
                .unstack(fill_value=0)
                .reset_index()
                .rename_axis(None, axis=1)
                .rename(columns={False:'value', True:'no value'}))
print (df)
  col1  value  no value
0    x      2         2
1    y      2         2
2    z      0         2

For each distinct value in a given column, count the null and non-null values in another column

Question

2 answers

solution1
2 ACCPTED 2021-03-17 11:56:59

solution2
1 2021-03-17 11:53:29

For each distinct value in a given column, count the null and non-null values in another column

Question

2 answers

solution1 2 ACCPTED 2021-03-17 11:56:59

solution2 1 2021-03-17 11:53:29

solution1
2 ACCPTED 2021-03-17 11:56:59

solution2
1 2021-03-17 11:53:29