Suppose I have a data frame like:
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]), })
This looks like
A B C D E
0 1 2013-01-02 1 3 test
1 1 2013-01-02 1 3 train
2 1 2013-01-02 1 3 test
3 1 2013-01-02 1 3 train
I want to append a "Totals" row for numeric columns and put in "Totals" in Column E.
So what I have is:
totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)
which yields
totals
A 4
C 4
D 12
E Total
dtype: object
So if I try
df2.append(totals, ignore_index=True)
I get
A B C D E
0 1 2013-01-02 00:00:00 1 3 test
1 1 2013-01-02 00:00:00 1 3 train
2 1 2013-01-02 00:00:00 1 3 test
3 1 2013-01-02 00:00:00 1 3 train
4 4 NaN 4 12 NaN
My question here is why doesn't column 'E' have a "totals" and why is it NaN?
Not sure why, but slight change works.
total = df2.sum()
total = total.append(pd.Series('Total', index=['E']))
df2.append(total, True)
Hope that helps!
You have to set categories
with category Total
by categories=["test","train","Total"]
.
I think you get NaN
, because this category does not exist.
import pandas as pd
import numpy as np
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"],
categories=["test","train","Total"])})
totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)
print df2.append(totals, True)
A B C D E
0 1 2013-01-02 1 3 test
1 1 2013-01-02 1 3 train
2 1 2013-01-02 1 3 test
3 1 2013-01-02 1 3 train
4 4 NaT 4 12 Total
First of all, you will get a NaN in column E unless it is an existing category (ie 'test' or 'train'). So first we must add your new value Total
to the categories, and reassign the result back to the column.
After doing this, your original method will work. However, I believe this is more straightforward approach:
df2['E'] = df2.E.cat.add_categories('Total')
df2.ix[len(df2)] = df2.sum()
df2.iat[-1, -1] = 'Total'
>>> df2
A B C D E
0 1 2013-01-02 1 3 test
1 1 2013-01-02 1 3 train
2 1 2013-01-02 1 3 test
3 1 2013-01-02 1 3 train
4 4 NaT 4 12 Total
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.