Python/Pandas：在一个 dataframe 中搜索日期，并在另一个 dataframe 的列中返回具有匹配日期的值

Question

I have two dataframes, one with earnings date and code for before market/after market and the other with daily OHLC data.我有两个数据框，一个带有收益日期和市场前/市场后代码，另一个带有每日 OHLC 数据。 First dataframe df:首先 dataframe df：

	earnDate赚取日期	anncTod anncTod
103 103	2015-11-18 2015-11-18	0900 0900
104 104	2016-02-24 2016-02-24	0900 0900
105 105	2016-05-18 2016-05-18	0900 0900
... ...	.......... …………	....... …………
128 128	2022-03-01 2022-03-01	0900 0900
129 129	2022-05-18 2022-05-18	0900 0900
130 130	2022-08-17 2022-08-17	0900 0900

Second dataframe af:第二个 dataframe af：

Datetime约会时间	Open打开	High高的	Low低的	Close关	Volume体积
2005-01-03 2005-01-03	36.3458 36.3458	36.6770 36.6770	35.5522 35.5522	35.6833 35.6833	3343500 3343500
........... ............	......... …………	......... …………	......... …………	........ ...........	........ ...........
2022-04-22 2022-04-22	246.5500 246.5500	247.2000 247.2000	241.4300 241.4300	241.9100 241.9100	1817977 1817977

I want to take a date from the first dataframe and find the open and/or close price in the second dataframe.我想从第一个 dataframe 中获取一个日期，并在第二个 dataframe 中找到开盘价和/或收盘价。 Depending on anncTod value, I want to find the close price of the previous day (if =0900) or the open and close price on the following day (else).根据 anncTod 值，我想找到前一天的收盘价（如果 =0900）或第二天的开盘价和收盘价（否则）。 I'll use these numbers to calculate the overnight, intraday and close-to-close move which will be stored in new columns on df.我将使用这些数字来计算隔夜、盘中和接近收盘的移动，这些移动将存储在 df 的新列中。

I'm not sure how to search matching values and fetch values from that row but a different column.我不确定如何搜索匹配值并从该行但不同的列中获取值。 I'm trying to do this with a df.iloc and a for loop.我正在尝试使用 df.iloc 和 for 循环来做到这一点。

Here's the full code:这是完整的代码：

import pandas as pd
import requests 
import datetime as dt

ticker = 'TGT'

## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)

data = response.json()
df = pd.DataFrame(data['data'])

## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]

## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])


## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = ''  ##col #3
df['Intraday'] = ''   ##col #4

for date in df['earnDate']:
    if df.iloc[date,1] == '0900':
        priorday = af.loc[af.index.get_loc(date)-1,0]
        priorclose = af.loc[priorday,4]
        open = af.loc[date,1]
        close = af.loc[date,4]
        df.iloc[date,2] = close/priorclose
        df.iloc[date,3] = open/priorclose
        df.iloc[date,4] = close/open
    else:
        print('afternoon')

I get an error:我收到一个错误：

if df.iloc[date,1] == '0900':
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Converting the date columns to integers creates another error.将日期列转换为整数会产生另一个错误。 Is there a better way I should go about doing this?有没有更好的方法我应该 go 这样做？

Ideal output would look like (made up numbers, abbreviated output):理想的 output 看起来像（组成数字，缩写输出）：

earnDate赚取日期	anncTod anncTod	Total Move总移动	Overnight Move隔夜搬家	Intraday Move盘中走势
2015-11-18 2015-11-18	0900 0900	9% 9%	7.2% 7.2%	1.8% 1.8%

But would include all the dates given in the first dataframe.但将包括第一个 dataframe 中给出的所有日期。

UPDATE更新

I swapped df.iloc for df.loc and that seems to have solved that problem.我将 df.iloc 换成了 df.loc，这似乎解决了这个问题。 The new issue is searching for variable 'date' in the second dataframe af.新问题是在第二个 dataframe af 中搜索变量“日期”。 I have simplified the code to just print the value in the 'Open' column while I trouble shoot.我已经简化了代码，以便在排除故障时仅打印“打开”列中的值。

Here is updated and simplified code (all else remains the same):这是更新和简化的代码（其他所有内容保持不变）：

import pandas as pd
import requests 
import datetime as dt

ticker = 'TGT'

## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)

data = response.json()
df = pd.DataFrame(data['data'])

## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]

## set index to earnDate
df = df.set_index(pd.DatetimeIndex(df['earnDate']))

## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])


## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = ''  ##col #3
df['Intraday'] = ''   ##col #4

for date in df['earnDate']:
    if df.loc[date, 'anncTod'] == '0900':
        print(af.loc[date,'Open']) ##this is line generating error
    else:
        print('afternoon')

I now get KeyError:'2015-11-18'我现在得到 KeyError:'2015-11-18'

Answer 1

To use loc to access a certain row, that assumes that the label you search for is in the index.要使用loc访问某一行，假设您搜索的 label 在索引中。 Specifically, that means that you'll need to set the date column as index.具体来说，这意味着您需要将日期列设置为索引。 EX:前任：

import pandas as pd

df = pd.DataFrame({'earnDate': ['2015-11-18', '2015-11-19', '2015-11-20'],
                   'anncTod': ['0900', '1000', '0800'],
                   'Open': [111, 222, 333]})

df = df.set_index(df["earnDate"])

for date in df['earnDate']:
    if df.loc[date, 'anncTod'] == '0900':
        print(df.loc[date, 'Open'])

# prints
# 111

Python/Pandas：在一个 dataframe 中搜索日期，并在另一个 dataframe 的列中返回具有匹配日期的值

问题描述

1 个解决方案

解决方案1
0 2022-09-22 06:41:28

Python/Pandas：在一个 dataframe 中搜索日期，并在另一个 dataframe 的列中返回具有匹配日期的值

问题描述

1 个解决方案

解决方案1 0 2022-09-22 06:41:28

解决方案1
0 2022-09-22 06:41:28