简体   繁体   English

Python/Pandas:在一个 dataframe 中搜索日期,并在另一个 dataframe 的列中返回具有匹配日期的值

[英]Python/Pandas: Search for date in one dataframe and return value in column of another dataframe with matching date

I have two dataframes, one with earnings date and code for before market/after market and the other with daily OHLC data.我有两个数据框,一个带有收益日期和市场前/市场后代码,另一个带有每日 OHLC 数据。 First dataframe df:首先 dataframe df:

earnDate赚取日期 anncTod anncTod
103 103 2015-11-18 2015-11-18 0900 0900
104 104 2016-02-24 2016-02-24 0900 0900
105 105 2016-05-18 2016-05-18 0900 0900
... ... .......... ………… ....... …………
128 128 2022-03-01 2022-03-01 0900 0900
129 129 2022-05-18 2022-05-18 0900 0900
130 130 2022-08-17 2022-08-17 0900 0900

Second dataframe af:第二个 dataframe af:

Datetime约会时间 Open打开 High高的 Low低的 Close Volume体积
2005-01-03 2005-01-03 36.3458 36.3458 36.6770 36.6770 35.5522 35.5522 35.6833 35.6833 3343500 3343500
........... ............ ......... ………… ......... ………… ......... ………… ........ ........... ........ ...........
2022-04-22 2022-04-22 246.5500 246.5500 247.2000 247.2000 241.4300 241.4300 241.9100 241.9100 1817977 1817977

I want to take a date from the first dataframe and find the open and/or close price in the second dataframe.我想从第一个 dataframe 中获取一个日期,并在第二个 dataframe 中找到开盘价和/或收盘价。 Depending on anncTod value, I want to find the close price of the previous day (if =0900) or the open and close price on the following day (else).根据 anncTod 值,我想找到前一天的收盘价(如果 =0900)或第二天的开盘价和收盘价(否则)。 I'll use these numbers to calculate the overnight, intraday and close-to-close move which will be stored in new columns on df.我将使用这些数字来计算隔夜、盘中和接近收盘的移动,这些移动将存储在 df 的新列中。

I'm not sure how to search matching values and fetch values from that row but a different column.我不确定如何搜索匹配值并从该行但不同的列中获取值。 I'm trying to do this with a df.iloc and a for loop.我正在尝试使用 df.iloc 和 for 循环来做到这一点。

Here's the full code:这是完整的代码:

import pandas as pd
import requests 
import datetime as dt

ticker = 'TGT'

## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)

data = response.json()
df = pd.DataFrame(data['data'])

## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]

## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])


## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = ''  ##col #3
df['Intraday'] = ''   ##col #4

for date in df['earnDate']:
    if df.iloc[date,1] == '0900':
        priorday = af.loc[af.index.get_loc(date)-1,0]
        priorclose = af.loc[priorday,4]
        open = af.loc[date,1]
        close = af.loc[date,4]
        df.iloc[date,2] = close/priorclose
        df.iloc[date,3] = open/priorclose
        df.iloc[date,4] = close/open
    else:
        print('afternoon')

I get an error:我收到一个错误:

if df.iloc[date,1] == '0900':
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Converting the date columns to integers creates another error.将日期列转换为整数会产生另一个错误。 Is there a better way I should go about doing this?有没有更好的方法我应该 go 这样做?

Ideal output would look like (made up numbers, abbreviated output):理想的 output 看起来像(组成数字,缩写输出):

earnDate赚取日期 anncTod anncTod Total Move总移动 Overnight Move隔夜搬家 Intraday Move盘中走势
2015-11-18 2015-11-18 0900 0900 9% 9% 7.2% 7.2% 1.8% 1.8%

But would include all the dates given in the first dataframe.但将包括第一个 dataframe 中给出的所有日期。

UPDATE更新

I swapped df.iloc for df.loc and that seems to have solved that problem.我将 df.iloc 换成了 df.loc,这似乎解决了这个问题。 The new issue is searching for variable 'date' in the second dataframe af.新问题是在第二个 dataframe af 中搜索变量“日期”。 I have simplified the code to just print the value in the 'Open' column while I trouble shoot.我已经简化了代码,以便在排除故障时仅打印“打开”列中的值。

Here is updated and simplified code (all else remains the same):这是更新和简化的代码(其他所有内容保持不变):

import pandas as pd
import requests 
import datetime as dt

ticker = 'TGT'

## pull orats earnings dates and store in pandas dataframe
url = f'https://api.orats.io/datav2/hist/earnings.json?token=keyhere={ticker}'
response = requests.get(url, allow_redirects=True)

data = response.json()
df = pd.DataFrame(data['data'])

## reduce number of dates to last 28 quarters and remove updatedAt column
n = len(df.index)-28
df.drop(index=df.index[:n], inplace=True)
df = df.iloc[: , 1:-1]

## set index to earnDate
df = df.set_index(pd.DatetimeIndex(df['earnDate']))

## import daily OHLC stock data file
loc = f"C:\\Users\\anon\\Historical Stock Data\\us3000_tickers_daily\\{ticker}_daily.txt"
af = pd.read_csv(loc, delimiter=',', names=['Datetime','Open','High','Low','Close','Volume'])


## create total return, overnight and intraday columns in df
df['Total Move'] = '' ##col #2
df['Overnight'] = ''  ##col #3
df['Intraday'] = ''   ##col #4

for date in df['earnDate']:
    if df.loc[date, 'anncTod'] == '0900':
        print(af.loc[date,'Open']) ##this is line generating error
    else:
        print('afternoon')

I now get KeyError:'2015-11-18'我现在得到 KeyError:'2015-11-18'

To use loc to access a certain row, that assumes that the label you search for is in the index.要使用loc访问某一行,假设您搜索的 label 在索引中。 Specifically, that means that you'll need to set the date column as index.具体来说,这意味着您需要将日期列设置为索引。 EX:前任:

import pandas as pd

df = pd.DataFrame({'earnDate': ['2015-11-18', '2015-11-19', '2015-11-20'],
                   'anncTod': ['0900', '1000', '0800'],
                   'Open': [111, 222, 333]})

df = df.set_index(df["earnDate"])

for date in df['earnDate']:
    if df.loc[date, 'anncTod'] == '0900':
        print(df.loc[date, 'Open'])

# prints
# 111

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将一个Pandas数据帧列与日期相结合,另一个与时间相结合? - How to combine one Pandas dataframe column with a date and another with a time? pandas dataframe函数返回日期最近的行,并且其中一列包含输入值,抛出错误 - pandas dataframe function to return rows where date is most recent and one of the column contains the input value, throwing error 如果另一个 Python pandas dataframe 中的两个日期之间的日期,则更新列 - Update column if date between 2 dates in another Python pandas dataframe Python Pandas:将一列的值检查到另一列 dataframe - Python Pandas: checking value of one column into column of another dataframe 根据列的日期和值重新索引Pandas数据框 - Reindex Pandas dataframe based on date and value of a column 熊猫数据框:将列中的日期转换为行中的值 - Pandas dataframe: turn date in column into value in row 在 Pandas 中逐行比较一个日期框中的日期列值与另一个数据框中的两个日期列 - Comparing date column values in one dateframe with two date column in another dataframe by row in Pandas 在 Python Pandas 中使用 DataFrame 中的日期列? - Work with date column in DataFrame in Python Pandas? 创建未来的 DataFrame 日期列 - Pandas - Python - Create Future DataFrame Date Column - Pandas - Python Pandas/python 并在数据框中使用带有日期的列 - Pandas/python and working with a column, in a dataframe, with a date
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM