[英]Summing a Pandas Dataframe
I am running Python 3.6 and Pandas 0.19.2 in PyCharm Community Edition 2016.3.2 and am trying to ensure a set of rows in my dataframe adds up to 1. 我在PyCharm Community Edition 2016.3.2中运行Python 3.6和Pandas 0.19.2,并试图确保数据框中的一组行加起来为1。
Initially my dataframe looks as follows: 最初,我的数据框如下所示:
hello world label0 label1 label2
abc def 1.0 0.0 0.0
why not 0.33 0.34 0.33
hello you 0.33 0.38 0.15
I proceed as follows: 我进行如下操作:
# get list of label columns (all column headers that contain the string 'label')
label_list = df.filter(like='label').columns
# ensure every row adds to 1
if (df[label_list].sum(axis=1) != 1).any():
print('ERROR')
Unfortunately this code does not work for me. 不幸的是,这段代码对我不起作用。 What seems to be happening is that instead of summing my rows, I just get the value of the first column in my filtered data.
似乎正在发生的事情是,我没有对行进行求和,而是获得了过滤数据中第一列的值。 In other words:
df[label_list].sum(axis=1)
returns: 换句话说:
df[label_list].sum(axis=1)
返回:
0 1.0
1 0.33
2 0.33
This should be trivial, but I just can't figure out what I'm doing wrong. 这应该是微不足道的,但是我只是不知道自己在做什么错。 Thanks up front for the help!
预先感谢您的帮助!
UPDATE: 更新:
This is an excerpt from my original data after I have filtered for label columns: 这是我为标签列过滤后的原始数据的摘录:
label0 label1 label2 label3 label4 label5 label6 label7 label8
1 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
2 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
3 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
4 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
5 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
6 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
7 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
8 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
9 0.34 0.1 0.1 0.1 0.2 0.4 0.1 0.1 1.2
My code from above still does not work, and I still have absolutely no idea why. 我上面的代码仍然无法正常工作,我仍然完全不知道为什么。 When I run my code in python console everything works perfectly fine, but when I run my code in Pycharm 2016.3.2,
label_data.sum(axis=1)
just returns the values of the first column. 当我在python控制台中运行我的代码时,一切工作都很好,但是当我在Pycharm 2016.3.2中运行我的代码时,
label_data.sum(axis=1)
仅返回第一列的值。
With your sample data for me it works. 有了我的样本数据,它就可以工作。 Just try to reproduce your sample adding a new column
check
to control the sum: 只需尝试通过添加新的列
check
来控制总和来重现您的样本:
In [3]: df
Out[3]:
hello world label0 label1 label2
0 abc def 1.00 0.00 0.00
1 why not 0.33 0.34 0.33
2 hello you 0.33 0.38 0.15
In [4]: df['check'] = df.sum(axis=1)
In [5]: df
Out[5]:
hello world label0 label1 label2 check
0 abc def 1.00 0.00 0.00 1.00
1 why not 0.33 0.34 0.33 1.00
2 hello you 0.33 0.38 0.15 0.86
In [6]: label_list = df.filter(like='label').columns
In [7]: label_list
Out[7]: Index([u'label0', u'label1', u'label2'], dtype='object')
In [8]: df[label_list].sum(axis=1)
Out[8]:
0 1.00
1 1.00
2 0.86
dtype: float64
In [9]: if (df[label_list].sum(axis=1) != 1).any():
...: print('ERROR')
...:
ERROR
Turns out my data type was not consistent. 原来我的数据类型不一致。 I used
astype(float)
and things worked out. 我使用了
astype(float)
,结果解决了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.