Appending a Series as a Row Data Frame Pandas (Python 3.4)

Question

Suppose I have a data frame like:

df2 = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]), })

This looks like

    A   B           C   D   E        
0   1   2013-01-02  1   3   test    
1   1   2013-01-02  1   3   train   
2   1   2013-01-02  1   3   test    
3   1   2013-01-02  1   3   train

I want to append a "Totals" row for numeric columns and put in "Totals" in Column E.

So what I have is:

totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)

which yields

totals
A        4
C        4
D       12
E    Total
dtype: object

So if I try

df2.append(totals, ignore_index=True)

I get

A   B                       C   D   E
0   1   2013-01-02 00:00:00 1   3   test
1   1   2013-01-02 00:00:00 1   3   train   
2   1   2013-01-02 00:00:00 1   3   test    
3   1   2013-01-02 00:00:00 1   3   train
4   4   NaN                 4   12  NaN

My question here is why doesn't column 'E' have a "totals" and why is it NaN?

Answer 1

Not sure why, but slight change works.

total = df2.sum()
total = total.append(pd.Series('Total', index=['E']))
df2.append(total, True)

Hope that helps!

Answer 2

You have to set categories with category Total by categories=["test","train","Total"] .

I think you get NaN , because this category does not exist.

import pandas as pd
import numpy as np


df2 = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"], 
                                           categories=["test","train","Total"])})


totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)
print df2.append(totals, True)
   A          B  C   D      E
0  1 2013-01-02  1   3   test
1  1 2013-01-02  1   3  train
2  1 2013-01-02  1   3   test
3  1 2013-01-02  1   3  train
4  4        NaT  4  12  Total

Answer 3

First of all, you will get a NaN in column E unless it is an existing category (ie 'test' or 'train'). So first we must add your new value Total to the categories, and reassign the result back to the column.

After doing this, your original method will work. However, I believe this is more straightforward approach:

df2['E'] = df2.E.cat.add_categories('Total')
df2.ix[len(df2)] = df2.sum()
df2.iat[-1, -1] = 'Total'

>>> df2
   A          B  C   D      E
0  1 2013-01-02  1   3   test
1  1 2013-01-02  1   3  train
2  1 2013-01-02  1   3   test
3  1 2013-01-02  1   3  train
4  4        NaT  4  12  Total

Appending a Series as a Row Data Frame Pandas (Python 3.4)

Question

3 answers

solution1
0 2016-03-03 22:45:47

solution2
0 2016-03-03 23:07:53

solution3
0 2016-03-04 02:45:55

Appending a Series as a Row Data Frame Pandas (Python 3.4)

Question

3 answers

solution1 0 2016-03-03 22:45:47

solution2 0 2016-03-03 23:07:53

solution3 0 2016-03-04 02:45:55

solution1
0 2016-03-03 22:45:47

solution2
0 2016-03-03 23:07:53

solution3
0 2016-03-04 02:45:55