简体   繁体   中英

Appending a Series as a Row Data Frame Pandas (Python 3.4)

Suppose I have a data frame like:

df2 = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]), })

This looks like

    A   B           C   D   E        
0   1   2013-01-02  1   3   test    
1   1   2013-01-02  1   3   train   
2   1   2013-01-02  1   3   test    
3   1   2013-01-02  1   3   train   

I want to append a "Totals" row for numeric columns and put in "Totals" in Column E.

So what I have is:

totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)

which yields

totals
A        4
C        4
D       12
E    Total
dtype: object

So if I try

df2.append(totals, ignore_index=True)

I get

A   B                       C   D   E
0   1   2013-01-02 00:00:00 1   3   test
1   1   2013-01-02 00:00:00 1   3   train   
2   1   2013-01-02 00:00:00 1   3   test    
3   1   2013-01-02 00:00:00 1   3   train
4   4   NaN                 4   12  NaN 

My question here is why doesn't column 'E' have a "totals" and why is it NaN?

Not sure why, but slight change works.

total = df2.sum()
total = total.append(pd.Series('Total', index=['E']))
df2.append(total, True)

Hope that helps!

You have to set categories with category Total by categories=["test","train","Total"] .

I think you get NaN , because this category does not exist.

import pandas as pd
import numpy as np


df2 = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"], 
                                           categories=["test","train","Total"])})


totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)
print df2.append(totals, True)
   A          B  C   D      E
0  1 2013-01-02  1   3   test
1  1 2013-01-02  1   3  train
2  1 2013-01-02  1   3   test
3  1 2013-01-02  1   3  train
4  4        NaT  4  12  Total

First of all, you will get a NaN in column E unless it is an existing category (ie 'test' or 'train'). So first we must add your new value Total to the categories, and reassign the result back to the column.

After doing this, your original method will work. However, I believe this is more straightforward approach:

df2['E'] = df2.E.cat.add_categories('Total')
df2.ix[len(df2)] = df2.sum()
df2.iat[-1, -1] = 'Total'

>>> df2
   A          B  C   D      E
0  1 2013-01-02  1   3   test
1  1 2013-01-02  1   3  train
2  1 2013-01-02  1   3   test
3  1 2013-01-02  1   3  train
4  4        NaT  4  12  Total

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM