简体   繁体   English

Python Pandas:如何在 Z6A8064B5DF4794555500553C47C55057DZ 的特定列中的两个非连续行中相减

[英]Python Pandas: How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another

I am trying to populate the values in a new column in a Pandas df by subtracting the value of two non-consecutive rows in a different column within the same df.我试图通过减去同一df中不同列中两个非连续行的值来填充Pandas df中新列中的值。 I can do it, so long as the df does not have a column with dates in it.我可以做到,只要 df 没有包含日期的列。 But if it does have a column with dates then pandas throws an error.但如果它确实有一列带有日期,那么 pandas 会引发错误。

Assume the following dataframe.假设如下 dataframe。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 55, 9], [10, 99, 19], [27, 38, 29], [39, 10, 72]]),
                   columns=['a', 'b', 'c'])
df['Date'] = ['2020-01-02', '2020-01-05', '2020-06-10', '2020-08-05', '2020-09-01', '2020-10-29']
df['Date'] = pd.to_datetime(df['Date'])

df['d'] = ''
df = df[['Date', 'a', 'b', 'c', 'd']]

This gives me a df that looks like this:这给了我一个看起来像这样的df:

    Date        a   b   c   d
0   2020-01-02  1   2   3   
1   2020-01-05  4   5   6   
2   2020-06-10  7   55  9   
3   2020-08-05  10  99  19  
4   2020-09-01  27  38  29  
5   2020-10-29  39  10  72  

I am trying to create a new column 'd' that, for each row, subtracts the value in column 'b' two rows below from the row in question.我正在尝试创建一个新列“d”,对于每一行,从相关行中减去两行下方“b”列中的值。 For instance, the value in row [0], column ['d'] would be calculated as df.loc[2]['b'] - df.loc[0]['b'].例如,行 [0]、列 ['d'] 中的值将计算为 df.loc[2]['b'] - df.loc[0]['b']。

What I'm trying (which doesn't work) is:我正在尝试(不起作用)是:

for i in range(len(df)-2):
    df.loc[i]['d'] = df.loc[i+2]['b'] - df.loc[i]['b']

I can get this to work if I have no date in the df.如果我在 df 中没有日期,我可以让它工作。 But when I add a column with dates, it throws an error message saying但是当我添加一个带有日期的列时,它会抛出一条错误消息说

A value is trying to be set on a copy of a slice from a DataFrame

I can't figure out why a date column causes the df to be unable to do math on columns with only int64 data.我无法弄清楚为什么日期列会导致 df 无法对只有 int64 数据的列进行数学运算。 I've tried searching this site and just can't seem to solve the problem.我试过搜索这个网站,但似乎无法解决问题。 Any help would be greatly appreciated.任何帮助将不胜感激。

You can do it in vectorized form using shift (which is considerably faster than using loops):您可以使用shift以矢量化形式执行此操作(这比使用循环要快得多):

df['d'] = df['b'].shift(-2) - df['b']
df

Output: Output:

        Date   a   b   c     d
0 2020-01-02   1   2   3  53.0
1 2020-01-05   4   5   6  94.0
2 2020-06-10   7  55   9 -17.0
3 2020-08-05  10  99  19 -89.0
4 2020-09-01  27  38  29   NaN
5 2020-10-29  39  10  72   NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Pandas DataFrame中提取多个非连续索引值 - Pulling multiple, non-consecutive index values from a Pandas DataFrame 两个非连续行的差异 - Pandas - Difference of two non-consecutive rows - Pandas 从数据框列中删除非连续值 - Delete non-consecutive values from a dataframe column 如何计算Python DataFrame中非连续行之间的差异? - How to calculate difference between non-consecutive rows in Python DataFrame? 从 pandas dataframe 中选择不连续的和连续的列 - Selecting non-consecutive and consecutive columns from a pandas dataframe 给定另一列的值,减去熊猫数据框中的连续时间范围 - Subtract consecutive timeframes in a pandas dataframe given the values of another column 如何根据熊猫数据框中的非连续索引列表替换值? - How to replace values according to non-consecutive list of indices in pandas dataframe? 如何在python中一次读取文件中的两个非连续行 - How to read two non-consecutive lines in a file once in python 在 Pandas 数据框中选择连续和非连续的多个列 - Selecting multiple columns, both consecutive and non-consecutive, in a Pandas dataframe 如何从一个数据框中的列中提取特定值并将它们附加到另一个数据框中的列中? - 熊猫 - How do you extract specific values from a column in one dataframe and append them to a column in another dataframe? - Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM