简体   繁体   English

设置DataFrame的列为熊猫中另一个列的总和

[英]Set columns of DataFrame to sum of columns of another in pandas

I have a DataFrame that looks like the below, call this "values": 我有一个如下所示的DataFrame,称其为“值”:

在此处输入图片说明

I would like to create another, call it "sums" that contains the sum of the DataFrame "values" from the column in "sums" to the end. 我想创建另一个,将其称为“ sums”,其中包含从“ sums”中的列到末尾的DataFrame“ values”的总和。 It would look like the below: 它看起来像下面的样子:

在此处输入图片说明

I would like to create this without looking through the entire DataFrame, data point by data point. 我想创建这个而不用逐个数据地查看整个DataFrame。 I have been trying with .apply() as seen below, but I keep getting the error: unsupported operand type(s) for +: 'int' and 'datetime.date' 我一直在尝试使用.apply() ,如下所示,但是我一直收到错误: unsupported operand type(s) for +: 'int' and 'datetime.date'

In [26]: values = pandas.DataFrame({0:[96,54,27,28],
              1:[55,75,32,37],2:[54,99,36,46],3:[35,77,0,10],4:[62,25,0,25],
              5:[0,66,0,89],6:[0,66,0,89],7:[0,0,0,0],8:[0,0,0,0]})

In [28]: sums = values.copy()

In [29]: sums.iloc[:,:] = ''         

In [31]: for column in sums:
    ...:     sums[column].apply(sum(values.loc[:,column:]))
    ...:     
Traceback (most recent call last):

  File "<ipython-input-31-030442e5005e>", line 2, in <module>
    sums[column].apply(sum(values.loc[:,column:]))
  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\series.py", line 2220, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1088, in pandas.lib.map_infer (pandas\lib.c:63043)

TypeError: 'numpy.int64' object is not callable


In [32]: for column in sums:
    ...:     sums[column] = sum(values.loc[:,column:])

In [33]: sums
Out[33]: 
    0   1   2   3   4   5   6   7  8
0  36  36  35  33  30  26  21  15  8
1  36  36  35  33  30  26  21  15  8
2  36  36  35  33  30  26  21  15  8
3  36  36  35  33  30  26  21  15  8

Is there a way to do this without looping each point individually? 有没有一种方法可以不单独循环每个点?

Without looping, you can reverse your dataframe, cumsum per line and then re-reverse it: 无需循环,您可以反转数据帧,每行的cumsum ,然后重新反转它:

>>> values.iloc[:,::-1].cumsum(axis=1).iloc[:,::-1]
     0    1    2    3    4    5   6  7  8
0  302  206  151   97   62    0   0  0  0
1  462  408  333  234  157  132  66  0  0
2   95   68   36    0    0    0   0  0  0
3  324  296  259  213  203  178  89  0  0

You can use the .cumsum() method to get the cumulative sum. 您可以使用.cumsum()方法获取累积和。 The problem is that is operates from left to right, where you need it from right to left. 问题是操作是从左到右,在您需要的地方从右到左。

So we will reverse you data frame, use cumsum() , then set the axes back into the proper order. 因此,我们将反转数据框,使用cumsum() ,然后将轴重新设置为正确的顺序。

import pandas as pd

values = pd.DataFrame({0:[96,54,27,28],
          1:[55,75,32,37],2:[54,99,36,46],3:[35,77,0,10],4:[62,25,0,25],
          5:[0,66,0,89],6:[0,66,0,89],7:[0,0,0,0],8:[0,0,0,0]})

values[values.columns[::-1]].cumsum(axis=1).reindex_axis(values.columns, axis=1)

# returns:
     0    1    2    3    4    5   6  7  8
0  302  206  151   97   62    0   0  0  0
1  462  408  333  234  157  132  66  0  0
2   95   68   36    0    0    0   0  0  0
3  324  296  259  213  203  178  89  0  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM