简体   繁体   中英

Pandas/yfinance - Insert max/min from one column in a new column on the corresponding row

I can't find anything with what I'm trying to accomplish. I have the 5 minutes stocks table and I'm trying to mark / insert the day's high into a new cell next to it, like in the image below.

Note: the rows are just a selection. We don't know how many minutes are in each trading day, as some trading days might have been shorter and can't start with the assumption of fixed minutes in a day.

在此处输入图像描述

I want to see historically, at what time of the day the high of the day usually happens, for a certain stock.

The programming logic would be

// Find the max High of a day
while(dateTime[i] is part of dayFromDatetime) {
   maxOfDay = max(High[i++], maxOfDay)
}

// Insert 1 in the corresponding cell for the High of day column
while(dateTime[i] is part of dayFromDatetime) {
   if(High[i] == maxOfDay) {
       HighOfDay[i] = 1
   }
   i++
}

So, how can I accomplish this in pandas? I even tried getting the High of day from the daily chart and then trying to match it with the minutes charts, but I'm getting all sorts of DateTimeIndex errors when trying to convert the index to date.

This is one way of doing it:

import pandas as pd
import numpy as np
np.random.seed(42)

df = pd.DataFrame({'Datetime': pd.to_datetime(['04-01-2021 00:00', '04-01-2021 00:01', '04-01-2021 00:02', '05-01-2021 00:00', '05-01-2021 00:01', '05-01-2021 00:02', '05-01-2021 00:03', '05-01-2021 00:04', '05-01-2021 00:05', '06-01-2021 00:00', '06-01-2021 00:01', '06-01-2021 00:02', '06-01-2021 00:03', '06-01-2021 00:04', '06-01-2021 00:05']),
                   'High': np.random.uniform(0,1, size=15) + 131
                   })

df['High of day'] = (df['High'] == df.groupby(df['Datetime'].dt.date)['High'].transform('max')).apply(lambda x: 1 if x else '')

print (df)

Output:

              Datetime        High High of day
0  2021-04-01 00:00:00  131.374540            
1  2021-04-01 00:01:00  131.950714           1
2  2021-04-01 00:02:00  131.731994            
3  2021-05-01 00:00:00  131.598658            
4  2021-05-01 00:01:00  131.156019            
5  2021-05-01 00:02:00  131.155995            
6  2021-05-01 00:03:00  131.058084            
7  2021-05-01 00:04:00  131.866176           1
8  2021-05-01 00:05:00  131.601115            
9  2021-06-01 00:00:00  131.708073            
10 2021-06-01 00:01:00  131.020584            
11 2021-06-01 00:02:00  131.969910           1
12 2021-06-01 00:03:00  131.832443            
13 2021-06-01 00:04:00  131.212339            
14 2021-06-01 00:05:00  131.181825  

So, I managed to do something like this and it seems to do what I wanted

from pandas_datareader import data as pdr
import pandas as pd
import yfinance as yf

yf.pdr_override()

# download dataframe
df1m = pdr.get_data_yahoo("AAPL", start="2021-02-01", end="2021-02-06", interval="1m")

days = pd.date_range(start="2021-02-01", end="2021-02-06", freq='1d')
df1m['High of Day'] = 0

for i in range(0, len(days)-1):
    if not df1m[days[i]:days[i+1]].empty:
        df1m.loc[df1m[days[i]:days[i+1]]['High'].idxmax(), 'High of Day'] = 1

print(df1m)

I'm getting this error which needs to be addressed somehow

FutureWarning: Indexing a timezone-aware DatetimeIndex with a timezone-naive 
datetime is deprecated and will raise KeyError in a future version.  Use a 
timezone-aware object instead.
start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)

But please, help me with a better solution, as I'm not a pandas developer and don't know the best practices.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM