简体   繁体   中英

python pandas apply function in groupby, and add results as column in data frame

IM practicing with sample data to learn pandas. I have sample data like the following:

symbol date_time close volume
XOM 2021-04-13 13:00:00 56.5 10000
XOM 2021-04-13 13:01:00 57.5 10000
XOM 2021-04-13 13:02:00 56.25 10000
XOM 2021-04-13 13:03:00 58.5 10000
AAPL 2021-04-13 13:00:00 135.6 10000
AAPL 2021-04-13 13:01:00 137.5 10000
AAPL 2021-04-13 13:02:00 136.25 10000
AAPL 2021-04-13 13:03:00 138.5 10000

I used the groupby function on symbol and close price to add some simple moving averages using panda.rolling.mean functions.

Now I'd like to us talib to calcuate the RSI, for each symbol. I thought I could use an apply and call a function. I see the output when I print the np array, however, Im not seeing the column added.

quote_data.groupby("sym")["close"].apply(calc_rsi).reset_index(name='rsi_test')


def calc_rsi(series):
    rsi_arr=np.array(series)
    RSI = talib.RSI(rsi_arr, timeperiod=14)
    #print(RSI) --> produces valid output
    return(RSI)

Sample Numpy array output is below, and first 14 values are nan which is expected.

         nan         nan         nan         nan         nan         nan
         nan         nan 17.10526316 30.8277027  38.64107884 36.42559842
 35.98126419 49.82352931 51.12420941 56.4889558  53.50561034 57.38372096
 63.24414699 65.34066328 65.70388628 60.26289822 61.54881365 61.54881365

It was index related.

setting the series index before passing it back works:

quote_data['rsi'] = quote_data.groupby("sym")["close"].apply(calc_rsi)

def calc_rsi(series):
    rsi_arr=np.array(series)
    RSI = talib.RSI(rsi_arr, timeperiod=14)
    rsi_series=pd.Series(RSI,series.index)
    #print(rsi_series.size)
    return(rsi_series)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM