Suppose I have the following dataframe:
df = pd.DataFrame({'col1':['x','y','z','x','x','x','y','z','y','y'],
'col2':[np.nan,'n1',np.nan,np.nan,'n3','n2','n5',np.nan,np.nan,np.nan]})
for each distinct element in col1
I want to count how may null and non-null value are there in col2
and summarise the result in a new dataframe. So far I used df1 = df[df['col1']=='x']
and then
print(df1[df1['col2'].isna()].shape[0],
df1[df1['col2'].notna()].shape[0])
I was then manually changing the value in df1
so that df1 = df[df['col1']=='y']
and df1 = df[df['col1']=='z']
. Yet my method is not efficient at all. The table I desire should look like the following:
col1 value no value
0 x 2 2
1 y 2 2
2 z 0 2
I have also tried df.groupby('col1').col2.nunique()
yet that only gives me result when there is non-null value.
Let us try crosstab
to create a frequency table where the index is the unique values in column col1
and columns represent the corresponding counts
of non-nan
and nan
values in col2
:
out = pd.crosstab(df['col1'], df['col2'].isna())
out.columns = ['value', 'no value']
>>> out
value no value
col1
x 2 2
y 2 2
z 0 2
Use SeriesGroupBy.value_counts
with SeriesGroupBy.value_counts
for counts with reshape by Series.unstack
and some data cleaning:
df = (df['col2'].isna()
.groupby(df['col1'])
.value_counts()
.unstack(fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
.rename(columns={False:'value', True:'no value'}))
print (df)
col1 value no value
0 x 2 2
1 y 2 2
2 z 0 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.