[英]Appending a Series as a Row Data Frame Pandas (Python 3.4)
Suppose I have a data frame like: 假设我有一个数据框,例如:
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]), })
This looks like 这看起来像
A B C D E
0 1 2013-01-02 1 3 test
1 1 2013-01-02 1 3 train
2 1 2013-01-02 1 3 test
3 1 2013-01-02 1 3 train
I want to append a "Totals" row for numeric columns and put in "Totals" in Column E. 我想为数字列添加一个“总计”行,并在列E中添加“总计”。
So what I have is: 所以我有:
totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)
which yields 产量
totals
A 4
C 4
D 12
E Total
dtype: object
So if I try 所以如果我尝试
df2.append(totals, ignore_index=True)
I get 我明白了
A B C D E
0 1 2013-01-02 00:00:00 1 3 test
1 1 2013-01-02 00:00:00 1 3 train
2 1 2013-01-02 00:00:00 1 3 test
3 1 2013-01-02 00:00:00 1 3 train
4 4 NaN 4 12 NaN
My question here is why doesn't column 'E' have a "totals" and why is it NaN? 我的问题是,为什么“ E”列没有“总计”,为什么它是NaN?
Not sure why, but slight change works. 不知道为什么,但是稍作更改即可。
total = df2.sum()
total = total.append(pd.Series('Total', index=['E']))
df2.append(total, True)
Hope that helps! 希望有帮助!
You have to set categories
with category Total
by categories=["test","train","Total"]
. 您必须设置
categories
,类别Total
为categories=["test","train","Total"]
。
I think you get NaN
, because this category does not exist. 我认为您会得到
NaN
,因为该类别不存在。
import pandas as pd
import numpy as np
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"],
categories=["test","train","Total"])})
totals=pd.Series('Total', index=['E'])
totals = df2.sum(numeric_only=True).append(totals)
print df2.append(totals, True)
A B C D E
0 1 2013-01-02 1 3 test
1 1 2013-01-02 1 3 train
2 1 2013-01-02 1 3 test
3 1 2013-01-02 1 3 train
4 4 NaT 4 12 Total
First of all, you will get a NaN in column E unless it is an existing category (ie 'test' or 'train'). 首先,除非列是现有类别(即“测试”或“培训”),否则您将在E列中获得NaN。 So first we must add your new value
Total
to the categories, and reassign the result back to the column. 因此,首先我们必须将您的新值
Total
添加到类别中,然后将结果重新分配回该列。
After doing this, your original method will work. 完成此操作后,您的原始方法将起作用。 However, I believe this is more straightforward approach:
但是,我认为这是更简单的方法:
df2['E'] = df2.E.cat.add_categories('Total')
df2.ix[len(df2)] = df2.sum()
df2.iat[-1, -1] = 'Total'
>>> df2
A B C D E
0 1 2013-01-02 1 3 test
1 1 2013-01-02 1 3 train
2 1 2013-01-02 1 3 test
3 1 2013-01-02 1 3 train
4 4 NaT 4 12 Total
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.