I have a df as follows:
Name | Gender | Age | Apple | Banana | Mango | Watermelon | Kiwi
----------------------------------------------------------------
Jack | Male | 20 | 2 | 3 | 10 | |
Jen | Female | 25 | 5 | | | 5 | 1
Jill | Female | 22 | 5 | 3 | | 5 |
John | Male | 21 | 6 | | | |
Joe | Male | 28 | 2 | 3 | | 5 |
Jim | Male | 26 | 2 | 3 | | |
I want to find the count of non-empty cell in for all columns, grouped by say "Gender".
In other words, the desired output is to have:
Fruits | Total | Male | Female |
------------------------------------
Apple | 6 | 4 | 3 |
Banana | 4 | 3 | 1 |
Mango | 1 | 1 | 0 |
Watermelon | 3 | 2 | 1 |
Kiwi | 1 | 0 | 1 |
-------------------------------------
Total | 16 | 10 | 6
Please note:
>> print type(df.iloc[1,4])
<type 'str'>
So, there is empty string, which I cannot fill with fillna()
method?
Use drop
+ replace
+ count
+ T
+ insert
:
df1 = df.drop(['Name', 'Age'], 1)
df = df1.replace({'':np.nan, 0:np.nan}).groupby('Gender').count().T
df.insert(0, 'Total', df.sum(1))
df.loc['Total'] = df.sum()
print (df)
Gender Total Female Male
Apple 6 2 4
Banana 4 1 3
Mango 1 0 1
Watermelon 3 2 1
Kiwi 1 1 0
Total 15 6 9
Also if need change columns order add reindex_axis
:
df1 = df.drop(['Name', 'Age'], 1)
df = df1.replace({'':np.nan, 0:np.nan}).groupby('Gender').count().T
df['Total'] = df.sum(1)
df.loc['Total'] = df.sum()
df = df.reindex_axis(['Total','Male','Female'], 1)
print (df)
Gender Total Male Female
Apple 6 4 2
Banana 4 3 1
Mango 1 1 0
Watermelon 3 1 2
Kiwi 1 0 1
Total 15 9 6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.