[英]Python Pandas: How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another
I am trying to populate the values in a new column in a Pandas df by subtracting the value of two non-consecutive rows in a different column within the same df.我试图通过减去同一df中不同列中两个非连续行的值来填充Pandas df中新列中的值。 I can do it, so long as the df does not have a column with dates in it.我可以做到,只要 df 没有包含日期的列。 But if it does have a column with dates then pandas throws an error.但如果它确实有一列带有日期,那么 pandas 会引发错误。
Assume the following dataframe.假设如下 dataframe。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 55, 9], [10, 99, 19], [27, 38, 29], [39, 10, 72]]),
columns=['a', 'b', 'c'])
df['Date'] = ['2020-01-02', '2020-01-05', '2020-06-10', '2020-08-05', '2020-09-01', '2020-10-29']
df['Date'] = pd.to_datetime(df['Date'])
df['d'] = ''
df = df[['Date', 'a', 'b', 'c', 'd']]
This gives me a df that looks like this:这给了我一个看起来像这样的df:
Date a b c d
0 2020-01-02 1 2 3
1 2020-01-05 4 5 6
2 2020-06-10 7 55 9
3 2020-08-05 10 99 19
4 2020-09-01 27 38 29
5 2020-10-29 39 10 72
I am trying to create a new column 'd' that, for each row, subtracts the value in column 'b' two rows below from the row in question.我正在尝试创建一个新列“d”,对于每一行,从相关行中减去两行下方“b”列中的值。 For instance, the value in row [0], column ['d'] would be calculated as df.loc[2]['b'] - df.loc[0]['b'].例如,行 [0]、列 ['d'] 中的值将计算为 df.loc[2]['b'] - df.loc[0]['b']。
What I'm trying (which doesn't work) is:我正在尝试(不起作用)是:
for i in range(len(df)-2):
df.loc[i]['d'] = df.loc[i+2]['b'] - df.loc[i]['b']
I can get this to work if I have no date in the df.如果我在 df 中没有日期,我可以让它工作。 But when I add a column with dates, it throws an error message saying但是当我添加一个带有日期的列时,它会抛出一条错误消息说
A value is trying to be set on a copy of a slice from a DataFrame
I can't figure out why a date column causes the df to be unable to do math on columns with only int64 data.我无法弄清楚为什么日期列会导致 df 无法对只有 int64 数据的列进行数学运算。 I've tried searching this site and just can't seem to solve the problem.我试过搜索这个网站,但似乎无法解决问题。 Any help would be greatly appreciated.任何帮助将不胜感激。
You can do it in vectorized form using shift
(which is considerably faster than using loops):您可以使用shift
以矢量化形式执行此操作(这比使用循环要快得多):
df['d'] = df['b'].shift(-2) - df['b']
df
Output: Output:
Date a b c d
0 2020-01-02 1 2 3 53.0
1 2020-01-05 4 5 6 94.0
2 2020-06-10 7 55 9 -17.0
3 2020-08-05 10 99 19 -89.0
4 2020-09-01 27 38 29 NaN
5 2020-10-29 39 10 72 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.