[英]Python/Pandas: Search for date in one dataframe and return value in column of another dataframe with matching date
I have two dataframes, one with earnings date and code for before market/after market and the other with daily OHLC data.我有两个数据框,一个带有收益日期和市场前/市场后代码,另一个带有每日 OHLC 数据。 First dataframe df:
首先 dataframe df:
earnDate![]() |
anncTod ![]() |
|
---|---|---|
103 ![]() |
2015-11-18 ![]() |
0900 ![]() |
104 ![]() |
2016-02-24 ![]() |
0900 ![]() |
105 ![]() |
2016-05-18 ![]() |
0900 ![]() |
... ![]() |
.......... ![]() |
....... ![]() |
128 ![]() |
2022-03-01 ![]() |
0900 ![]() |
129 ![]() |
2022-05-18 ![]() |
0900 ![]() |
130 ![]() |
2022-08-17 ![]() |
0900 ![]() |
Second dataframe af:第二个 dataframe af:
Datetime![]() |
Open![]() |
High![]() |
Low![]() |
Close![]() |
Volume![]() |
---|---|---|---|---|---|
2005-01-03 ![]() |
36.3458 ![]() |
36.6770 ![]() |
35.5522 ![]() |
35.6833 ![]() |
3343500 ![]() |
........... ![]() |
......... ![]() |
......... ![]() |
......... ![]() |
........ ![]() |
........ ![]() |
2022-04-22 ![]() |
246.5500 ![]() |
247.2000 ![]() |
241.4300 ![]() |
241.9100 ![]() |
1817977 ![]() |
I want to take a date from the first dataframe and find the open and/or close price in the second dataframe.我想从第一个 dataframe 中获取一个日期,并在第二个 dataframe 中找到开盘价和/或收盘价。 Depending on anncTod value, I want to find the close price of the previous day (if =0900) or the open and close price on the following day (else).
根据 anncTod 值,我想找到前一天的收盘价(如果 =0900)或第二天的开盘价和收盘价(否则)。 I'll use these numbers to calculate the overnight, intraday and close-to-close move which will be stored in new columns on df.
我将使用这些数字来计算隔夜、盘中和接近收盘的移动,这些移动将存储在 df 的新列中。
I'm not sure how to search matching values and fetch values from that row but a different column.我不确定如何搜索匹配值并从该行但不同的列中获取值。 I'm trying to do this with a df.iloc and a for loop.
我正在尝试使用 df.iloc 和 for 循环来做到这一点。
Here's the full code:这是完整的代码:
import pandas as pd
import requests
import datetime as dt
ticker = 'TGT'
## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)
data = response.json()
df = pd.DataFrame(data['data'])
## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]
## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])
## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = '' ##col #3
df['Intraday'] = '' ##col #4
for date in df['earnDate']:
if df.iloc[date,1] == '0900':
priorday = af.loc[af.index.get_loc(date)-1,0]
priorclose = af.loc[priorday,4]
open = af.loc[date,1]
close = af.loc[date,4]
df.iloc[date,2] = close/priorclose
df.iloc[date,3] = open/priorclose
df.iloc[date,4] = close/open
else:
print('afternoon')
I get an error:我收到一个错误:
if df.iloc[date,1] == '0900':
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
Converting the date columns to integers creates another error.将日期列转换为整数会产生另一个错误。 Is there a better way I should go about doing this?
有没有更好的方法我应该 go 这样做?
Ideal output would look like (made up numbers, abbreviated output):理想的 output 看起来像(组成数字,缩写输出):
earnDate![]() |
anncTod ![]() |
Total Move![]() |
Overnight Move![]() |
Intraday Move![]() |
---|---|---|---|---|
2015-11-18 ![]() |
0900 ![]() |
9% ![]() |
7.2% ![]() |
1.8% ![]() |
But would include all the dates given in the first dataframe.但将包括第一个 dataframe 中给出的所有日期。
UPDATE更新
I swapped df.iloc for df.loc and that seems to have solved that problem.我将 df.iloc 换成了 df.loc,这似乎解决了这个问题。 The new issue is searching for variable 'date' in the second dataframe af.
新问题是在第二个 dataframe af 中搜索变量“日期”。 I have simplified the code to just print the value in the 'Open' column while I trouble shoot.
我已经简化了代码,以便在排除故障时仅打印“打开”列中的值。
Here is updated and simplified code (all else remains the same):这是更新和简化的代码(其他所有内容保持不变):
import pandas as pd
import requests
import datetime as dt
ticker = 'TGT'
## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)
data = response.json()
df = pd.DataFrame(data['data'])
## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]
## set index to earnDate
df = df.set_index(pd.DatetimeIndex(df['earnDate']))
## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])
## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = '' ##col #3
df['Intraday'] = '' ##col #4
for date in df['earnDate']:
if df.loc[date, 'anncTod'] == '0900':
print(af.loc[date,'Open']) ##this is line generating error
else:
print('afternoon')
I now get KeyError:'2015-11-18'我现在得到 KeyError:'2015-11-18'
To use loc to access a certain row, that assumes that the label you search for is in the index.要使用loc访问某一行,假设您搜索的 label 在索引中。 Specifically, that means that you'll need to set the date column as index.
具体来说,这意味着您需要将日期列设置为索引。 EX:
前任:
import pandas as pd
df = pd.DataFrame({'earnDate': ['2015-11-18', '2015-11-19', '2015-11-20'],
'anncTod': ['0900', '1000', '0800'],
'Open': [111, 222, 333]})
df = df.set_index(df["earnDate"])
for date in df['earnDate']:
if df.loc[date, 'anncTod'] == '0900':
print(df.loc[date, 'Open'])
# prints
# 111
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.