简体   繁体   English

减去Pandas或Pyspark Dataframe中的连续列

[英]Subtract consecutive columns in a Pandas or Pyspark Dataframe

I would like to perform the following operation in a pandas or pyspark dataframe but i still havent found a solution. 我想在pandas或pyspark数据帧中执行以下操作,但我还没有找到解决方案。

I want to subtract the values from consecutive columns in a dataframe. 我想从数据帧中的连续列中减去这些值。

The operation I am describing can be seen in the image below. 我所描述的操作可以在下图中看到。

输入和输出数据帧

Bear in mind that the output dataframe wont have any values on first column as the first column in the input table cannot be subtracted by its previous one as it doesn't exist. 请记住,输出数据框在第一列上没有任何值,因为输入表中的第一列不能被前一列减去,因为它不存在。

diff has an axis param so you can just do this in one step: diff有一个axis参数,所以你可以一步完成:

In [63]:
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df

Out[63]:
             A         B         C         D
row1  0.146855  0.250781  0.766990  0.756016
row2  0.528201  0.446637  0.576045  0.576907
row3  0.308577  0.592271  0.553752  0.512420

In [64]:
df.diff(axis=1)

Out[64]:
       A         B         C         D
row1 NaN  0.103926  0.516209 -0.010975
row2 NaN -0.081564  0.129408  0.000862
row3 NaN  0.283694 -0.038520 -0.041331
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df.T.diff().T

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM