简体   繁体   中英

Python/Pandas: Search for date in one dataframe and return value in column of another dataframe with matching date

I have two dataframes, one with earnings date and code for before market/after market and the other with daily OHLC data. First dataframe df:

earnDate anncTod
103 2015-11-18 0900
104 2016-02-24 0900
105 2016-05-18 0900
... .......... .......
128 2022-03-01 0900
129 2022-05-18 0900
130 2022-08-17 0900

Second dataframe af:

Datetime Open High Low Close Volume
2005-01-03 36.3458 36.6770 35.5522 35.6833 3343500
........... ......... ......... ......... ........ ........
2022-04-22 246.5500 247.2000 241.4300 241.9100 1817977

I want to take a date from the first dataframe and find the open and/or close price in the second dataframe. Depending on anncTod value, I want to find the close price of the previous day (if =0900) or the open and close price on the following day (else). I'll use these numbers to calculate the overnight, intraday and close-to-close move which will be stored in new columns on df.

I'm not sure how to search matching values and fetch values from that row but a different column. I'm trying to do this with a df.iloc and a for loop.

Here's the full code:

import pandas as pd
import requests 
import datetime as dt

ticker = 'TGT'

## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)

data = response.json()
df = pd.DataFrame(data['data'])

## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]

## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])


## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = ''  ##col #3
df['Intraday'] = ''   ##col #4

for date in df['earnDate']:
    if df.iloc[date,1] == '0900':
        priorday = af.loc[af.index.get_loc(date)-1,0]
        priorclose = af.loc[priorday,4]
        open = af.loc[date,1]
        close = af.loc[date,4]
        df.iloc[date,2] = close/priorclose
        df.iloc[date,3] = open/priorclose
        df.iloc[date,4] = close/open
    else:
        print('afternoon')

I get an error:

if df.iloc[date,1] == '0900':
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Converting the date columns to integers creates another error. Is there a better way I should go about doing this?

Ideal output would look like (made up numbers, abbreviated output):

earnDate anncTod Total Move Overnight Move Intraday Move
2015-11-18 0900 9% 7.2% 1.8%

But would include all the dates given in the first dataframe.

UPDATE

I swapped df.iloc for df.loc and that seems to have solved that problem. The new issue is searching for variable 'date' in the second dataframe af. I have simplified the code to just print the value in the 'Open' column while I trouble shoot.

Here is updated and simplified code (all else remains the same):

import pandas as pd
import requests 
import datetime as dt

ticker = 'TGT'

## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)

data = response.json()
df = pd.DataFrame(data['data'])

## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]

## set index to earnDate
df = df.set_index(pd.DatetimeIndex(df['earnDate']))

## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])


## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = ''  ##col #3
df['Intraday'] = ''   ##col #4

for date in df['earnDate']:
    if df.loc[date, 'anncTod'] == '0900':
        print(af.loc[date,'Open']) ##this is line generating error
    else:
        print('afternoon')

I now get KeyError:'2015-11-18'

To use loc to access a certain row, that assumes that the label you search for is in the index. Specifically, that means that you'll need to set the date column as index. EX:

import pandas as pd

df = pd.DataFrame({'earnDate': ['2015-11-18', '2015-11-19', '2015-11-20'],
                   'anncTod': ['0900', '1000', '0800'],
                   'Open': [111, 222, 333]})

df = df.set_index(df["earnDate"])

for date in df['earnDate']:
    if df.loc[date, 'anncTod'] == '0900':
        print(df.loc[date, 'Open'])

# prints
# 111

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM