简体   繁体   English

根据不同索引的值减去两个变量,在 Pandas 中创建新列

[英]Creating New Columns in Pandas based on subtracting two variables based on value from different indexes

I have a DateFrame df which contains Open High Low Close Volume and Date data for every minute for the past ten days.我有一个 DateFrame df ,其中包含过去十天每分钟的开盘高低收盘量和日期数据。

**open**    high    low **close**   volume  date
**436.9**   436.94  436.32  436.69  567139  4/22/2022 6:30
436.68  436.92  436.48  **436.7**   296374  4/22/2022 6:31
436.72  436.92  436.5   436.65  221020  4/22/2022 6:32
436.64  437.0865    436.59  437.04  178162  4/22/2022 6:33
437.03  437.23  436.63  436.7064    202557  4/22/2022 6:34
436.69  436.7   435.87  435.901 302312  4/22/2022 6:35
435.9   436.57  435.7814    436.31  259633  4/22/2022 6:36
436.29  436.67  435.9   435.9   231914  4/22/2022 6:37
435.9   436.19  435.7   436.0587    190760  4/22/2022 6:38
436.03  436.28  435.15  435.16  314455  4/22/2022 6:39
435.15  435.35  434.79  434.8368    417990  4/22/2022 6:40
434.82  435.06  434.67  434.98  267492  4/22/2022 6:41
435 435.13  434.68  434.84  198426  4/22/2022 6:42
434.84  434.86  434.25  434.29  330436  4/22/2022 6:43
434.31  434.45  433.84  434.13  382888  4/22/2022 6:44
434.15  434.82  433.96  434.45  456809  4/22/2022 6:45
434.47  435.05  434.04  435.02  303350  4/22/2022 6:46
435.03  435.03  434.38  434.39  222626  4/22/2022 6:47

My goal is to have a new column that displays the distance for that days open.我的目标是有一个新列显示打开的那几天的距离。 I want it to be for example for the second row I want it to equal -0.2 Because that rows close is 436.7 and that days Opening price was 436.9我希望它是例如第二行我希望它等于-0.2因为那行收盘价是436.7而那几天开盘价是436.9

This is what I have thought of so far这是我到目前为止的想法

start_time =  datetime.time(hour = 6, minute = 30 )
df['opens'] = np.where(df.time == start_time, df.open, ' ')

I think that there could be a way for it to check the value of 'opens' column and if it is blank to look at the previous indexes values to see if there is a value and to subtract the close of the current to the previous opens value.我认为可能有一种方法可以检查“打开”列的值,如果它是空白的,可以查看以前的索引值以查看是否有值,并将当前的收盘价减去以前的开盘价价值。 Or because I know that there are going to be only 10 data points for opening prices and I know that there are only 800 tradable minutes (during market hours) So I know that the open is going to be at 0 and 780 and so on.或者因为我知道开盘价只有 10 个数据点,而且我知道只有 800 分钟的可交易时间(在市场交易时间内)所以我知道开盘价将在 0 和 780 等等。 My question is what is the best way of accomplishing this?我的问题是完成此任务的最佳方法是什么?

You could groupby the dates and transform first open values for each group (this creates a column of the first open values for each day);您可以groupby日期分组并转换每个组的first开放值(这会为每一天创建一个第一个开放值的列); then subtract these values from close :然后从close中减去这些值:

df['date'] = pd.to_datetime(df['date'])
df['new'] = df['close'] - df.groupby(df['date'].dt.date)['open'].transform('first')

Output: Output:

      open      high       low     close  volume                date     new
0   436.90  436.9400  436.3200  436.6900  567139 2022-04-22 06:30:00 -0.2100
1   436.68  436.9200  436.4800  436.7000  296374 2022-04-22 06:31:00 -0.2000
2   436.72  436.9200  436.5000  436.6500  221020 2022-04-22 06:32:00 -0.2500
3   436.64  437.0865  436.5900  437.0400  178162 2022-04-22 06:33:00  0.1400
4   437.03  437.2300  436.6300  436.7064  202557 2022-04-22 06:34:00 -0.1936
5   436.69  436.7000  435.8700  435.9010  302312 2022-04-22 06:35:00 -0.9990
6   435.90  436.5700  435.7814  436.3100  259633 2022-04-22 06:36:00 -0.5900
7   436.29  436.6700  435.9000  435.9000  231914 2022-04-22 06:37:00 -1.0000
8   435.90  436.1900  435.7000  436.0587  190760 2022-04-22 06:38:00 -0.8413
9   436.03  436.2800  435.1500  435.1600  314455 2022-04-22 06:39:00 -1.7400
10  435.15  435.3500  434.7900  434.8368  417990 2022-04-22 06:40:00 -2.0632
11  434.82  435.0600  434.6700  434.9800  267492 2022-04-22 06:41:00 -1.9200
12  435.00  435.1300  434.6800  434.8400  198426 2022-04-22 06:42:00 -2.0600
13  434.84  434.8600  434.2500  434.2900  330436 2022-04-22 06:43:00 -2.6100
14  434.31  434.4500  433.8400  434.1300  382888 2022-04-22 06:44:00 -2.7700
15  434.15  434.8200  433.9600  434.4500  456809 2022-04-22 06:45:00 -2.4500
16  434.47  435.0500  434.0400  435.0200  303350 2022-04-22 06:46:00 -1.8800
17  435.03  435.0300  434.3800  434.3900  222626 2022-04-22 06:47:00 -2.5100

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于两列创建新变量作为索引一列作为新变量名称python pandas或R. - Creating new variables based on two columns as index one column as new variable names python pandas or R 根据单元格内容组合来自不同行的数据并基于单元格值创建新列 pandas 和 python - Combining data from different rows based on the cell content and creating new columns based on the cell values with pandas and python Pandas基于两个现有变量创建一个新变量 - Pandas creating a new variable based on two existing variables 根据不同行值的差异在 Pandas 中创建新列 - Creating new columns in Pandas based on difference of different row values Pandas:基于两个不同的列创建唯一值的索引 - Pandas: Creating an index of unique values based off of two different columns 根据 pandas dataframe 中的两列值创建新的 dataframe - Create a new dataframe based on two columns of value in pandas dataframe 跟进 - 根据 Pandas 中另一列的值创建新列 - Follow up - Creating new columns based on value from another column in pandas 根据多个列中至少有一个是否包含列表中的值这一事实创建新列(Python,熊猫) - creating new column based on the fact whether at least 1 of multiple columns contains value from the list (Python, pandas) 基于python pandas中其他列的值创建新列 - Creating a new column based on values from other columns in python pandas Pandas dataframe:根据其他列的数据创建新列 - Pandas dataframe: Creating a new column based on data from other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM