根据不同索引的值减去两个变量，在 Pandas 中创建新列

Question

I have a DateFrame df which contains Open High Low Close Volume and Date data for every minute for the past ten days.我有一个 DateFrame df ，其中包含过去十天每分钟的开盘高低收盘量和日期数据。

**open**    high    low **close**   volume  date
**436.9**   436.94  436.32  436.69  567139  4/22/2022 6:30
436.68  436.92  436.48  **436.7**   296374  4/22/2022 6:31
436.72  436.92  436.5   436.65  221020  4/22/2022 6:32
436.64  437.0865    436.59  437.04  178162  4/22/2022 6:33
437.03  437.23  436.63  436.7064    202557  4/22/2022 6:34
436.69  436.7   435.87  435.901 302312  4/22/2022 6:35
435.9   436.57  435.7814    436.31  259633  4/22/2022 6:36
436.29  436.67  435.9   435.9   231914  4/22/2022 6:37
435.9   436.19  435.7   436.0587    190760  4/22/2022 6:38
436.03  436.28  435.15  435.16  314455  4/22/2022 6:39
435.15  435.35  434.79  434.8368    417990  4/22/2022 6:40
434.82  435.06  434.67  434.98  267492  4/22/2022 6:41
435 435.13  434.68  434.84  198426  4/22/2022 6:42
434.84  434.86  434.25  434.29  330436  4/22/2022 6:43
434.31  434.45  433.84  434.13  382888  4/22/2022 6:44
434.15  434.82  433.96  434.45  456809  4/22/2022 6:45
434.47  435.05  434.04  435.02  303350  4/22/2022 6:46
435.03  435.03  434.38  434.39  222626  4/22/2022 6:47

My goal is to have a new column that displays the distance for that days open.我的目标是有一个新列显示打开的那几天的距离。 I want it to be for example for the second row I want it to equal -0.2 Because that rows close is 436.7 and that days Opening price was 436.9我希望它是例如第二行我希望它等于-0.2因为那行收盘价是436.7而那几天开盘价是436.9

This is what I have thought of so far这是我到目前为止的想法

start_time =  datetime.time(hour = 6, minute = 30 )
df['opens'] = np.where(df.time == start_time, df.open, ' ')

I think that there could be a way for it to check the value of 'opens' column and if it is blank to look at the previous indexes values to see if there is a value and to subtract the close of the current to the previous opens value.我认为可能有一种方法可以检查“打开”列的值，如果它是空白的，可以查看以前的索引值以查看是否有值，并将当前的收盘价减去以前的开盘价价值。 Or because I know that there are going to be only 10 data points for opening prices and I know that there are only 800 tradable minutes (during market hours) So I know that the open is going to be at 0 and 780 and so on.或者因为我知道开盘价只有 10 个数据点，而且我知道只有 800 分钟的可交易时间（在市场交易时间内）所以我知道开盘价将在 0 和 780 等等。 My question is what is the best way of accomplishing this?我的问题是完成此任务的最佳方法是什么？

Answer 1

You could groupby the dates and transform first open values for each group (this creates a column of the first open values for each day);您可以groupby日期分组并转换每个组的first开放值（这会为每一天创建一个第一个开放值的列）； then subtract these values from close :然后从close中减去这些值：

df['date'] = pd.to_datetime(df['date'])
df['new'] = df['close'] - df.groupby(df['date'].dt.date)['open'].transform('first')

Output: Output：

      open      high       low     close  volume                date     new
0   436.90  436.9400  436.3200  436.6900  567139 2022-04-22 06:30:00 -0.2100
1   436.68  436.9200  436.4800  436.7000  296374 2022-04-22 06:31:00 -0.2000
2   436.72  436.9200  436.5000  436.6500  221020 2022-04-22 06:32:00 -0.2500
3   436.64  437.0865  436.5900  437.0400  178162 2022-04-22 06:33:00  0.1400
4   437.03  437.2300  436.6300  436.7064  202557 2022-04-22 06:34:00 -0.1936
5   436.69  436.7000  435.8700  435.9010  302312 2022-04-22 06:35:00 -0.9990
6   435.90  436.5700  435.7814  436.3100  259633 2022-04-22 06:36:00 -0.5900
7   436.29  436.6700  435.9000  435.9000  231914 2022-04-22 06:37:00 -1.0000
8   435.90  436.1900  435.7000  436.0587  190760 2022-04-22 06:38:00 -0.8413
9   436.03  436.2800  435.1500  435.1600  314455 2022-04-22 06:39:00 -1.7400
10  435.15  435.3500  434.7900  434.8368  417990 2022-04-22 06:40:00 -2.0632
11  434.82  435.0600  434.6700  434.9800  267492 2022-04-22 06:41:00 -1.9200
12  435.00  435.1300  434.6800  434.8400  198426 2022-04-22 06:42:00 -2.0600
13  434.84  434.8600  434.2500  434.2900  330436 2022-04-22 06:43:00 -2.6100
14  434.31  434.4500  433.8400  434.1300  382888 2022-04-22 06:44:00 -2.7700
15  434.15  434.8200  433.9600  434.4500  456809 2022-04-22 06:45:00 -2.4500
16  434.47  435.0500  434.0400  435.0200  303350 2022-04-22 06:46:00 -1.8800
17  435.03  435.0300  434.3800  434.3900  222626 2022-04-22 06:47:00 -2.5100

根据不同索引的值减去两个变量，在 Pandas 中创建新列

问题描述

1 个解决方案

解决方案1
3

根据不同索引的值减去两个变量，在 Pandas 中创建新列

问题描述

1 个解决方案

解决方案1 3

解决方案1
3