简体   繁体   English

对熊猫数据框求和

[英]Summing a Pandas Dataframe

I am running Python 3.6 and Pandas 0.19.2 in PyCharm Community Edition 2016.3.2 and am trying to ensure a set of rows in my dataframe adds up to 1. 我在PyCharm Community Edition 2016.3.2中运行Python 3.6和Pandas 0.19.2,并试图确保数据框中的一组行加起来为1。

Initially my dataframe looks as follows: 最初,我的数据框如下所示:

 hello     world     label0    label1    label2
 abc       def       1.0       0.0       0.0
 why       not       0.33      0.34      0.33
 hello     you       0.33      0.38      0.15

I proceed as follows: 我进行如下操作:

# get list of label columns (all column headers that contain the string 'label')
label_list = df.filter(like='label').columns

# ensure every row adds to 1
if (df[label_list].sum(axis=1) != 1).any():
    print('ERROR')

Unfortunately this code does not work for me. 不幸的是,这段代码对我不起作用。 What seems to be happening is that instead of summing my rows, I just get the value of the first column in my filtered data. 似乎正在发生的事情是,我没有对行进行求和,而是获得了过滤数据中第一列的值。 In other words: df[label_list].sum(axis=1) returns: 换句话说: df[label_list].sum(axis=1)返回:

0     1.0
1     0.33
2     0.33

This should be trivial, but I just can't figure out what I'm doing wrong. 这应该是微不足道的,但是我只是不知道自己在做什么错。 Thanks up front for the help! 预先感谢您的帮助!

UPDATE: 更新:

This is an excerpt from my original data after I have filtered for label columns: 这是我为标签列过滤后的原始数据的摘录:

    label0 label1 label2 label3 label4 label5 label6 label7 label8
1    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
2    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
3    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
4    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
5    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
6    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
7    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
8    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2
9    0.34    0.1    0.1    0.1    0.2    0.4    0.1    0.1    1.2

My code from above still does not work, and I still have absolutely no idea why. 我上面的代码仍然无法正常工作,我仍然完全不知道为什么。 When I run my code in python console everything works perfectly fine, but when I run my code in Pycharm 2016.3.2, label_data.sum(axis=1) just returns the values of the first column. 当我在python控制台中运行我的代码时,一切工作都很好,但是当我在Pycharm 2016.3.2中运行我的代码时, label_data.sum(axis=1)仅返回第一列的值。

With your sample data for me it works. 有了我的样本数据,它就可以工作。 Just try to reproduce your sample adding a new column check to control the sum: 只需尝试通过添加新的列check来控制总和来重现您的样本:

In [3]: df
Out[3]: 
   hello world  label0  label1  label2
0    abc   def    1.00    0.00    0.00
1    why   not    0.33    0.34    0.33
2  hello   you    0.33    0.38    0.15

In [4]: df['check'] = df.sum(axis=1)

In [5]: df
Out[5]: 
   hello world  label0  label1  label2  check
0    abc   def    1.00    0.00    0.00   1.00
1    why   not    0.33    0.34    0.33   1.00
2  hello   you    0.33    0.38    0.15   0.86

In [6]: label_list = df.filter(like='label').columns

In [7]: label_list
Out[7]: Index([u'label0', u'label1', u'label2'], dtype='object')

In [8]: df[label_list].sum(axis=1)
Out[8]: 
0    1.00
1    1.00
2    0.86
dtype: float64

In [9]: if (df[label_list].sum(axis=1) != 1).any():
   ...:     print('ERROR')
   ...:     
ERROR

Turns out my data type was not consistent. 原来我的数据类型不一致。 I used astype(float) and things worked out. 我使用了astype(float) ,结果解决了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM