[英]Moving data from one Numpy array to another returning incorrect data
我正在使用pandas_market_calendars
獲取紐約證券交易所的市場日歷。 我將市場日歷設置為變量nyse_calendar
並注意它返回的數據類型是<class 'pandas_market_calendars.exchange_calendar_nyse.NYSEExchangeCalendar'>
。 這沒有幫助,因為我的主數據文件存儲在數據類型為<class 'numpy.str_'>
的 numpy 數組中。 所以我使用.to_numpy()
將nyse_calendar
轉換為 numpy 數組。 當我打印 now 的數據類型時,它返回<class 'pandas._libs.tslibs.timestamps.Timestamp'>
,同樣沒有用,因為我想將主數據文件中的日期和時間與此日歷進行比較。
因此,當我從trading_days
數組打印一個值時,它會在將其轉換為字符串后返回2011-09-20 13:30:00+00:00
。
所以,我想要做的是循環遍歷trading_days
(numpy 數組),並使用.split()
的組合將值轉換為字符串值。 以下是獲得更好上下文的代碼:
import numpy as np
import pandas_market_calendars as mkt_cal
from datetime import datetime
import pandas as pd
#set up the NYSE trading calendar
#create new market calendar
nyse_calendar = mkt_cal.get_calendar('NYSE')
#create a dataframe with only trading days - includes early closes
#needs to be from beginning of testing to end of testing data
nyse_schedule = nyse_calendar.schedule(start_date='2011-09-18', end_date='2019-12-05')
#convert dataframe to a numpy array
#reference: trading_days[0,0], trading_days[1,0] etc.
#open date & time in col 0, close date & time in col 1
trading_days = nyse_schedule.to_numpy()
print(trading_days)
>>>[[Timestamp('2011-09-19 13:30:00+0000', tz='UTC')
Timestamp('2011-09-19 20:00:00+0000', tz='UTC')]
[Timestamp('2011-09-20 13:30:00+0000', tz='UTC')
Timestamp('2011-09-20 20:00:00+0000', tz='UTC')]
[Timestamp('2011-09-21 13:30:00+0000', tz='UTC')
Timestamp('2011-09-21 20:00:00+0000', tz='UTC')]
...
[Timestamp('2019-12-03 14:30:00+0000', tz='UTC')
Timestamp('2019-12-03 21:00:00+0000', tz='UTC')]
[Timestamp('2019-12-04 14:30:00+0000', tz='UTC')
Timestamp('2019-12-04 21:00:00+0000', tz='UTC')]
[Timestamp('2019-12-05 14:30:00+0000', tz='UTC')
Timestamp('2019-12-05 21:00:00+0000', tz='UTC')]]
print("trading data type: ",type(trading_days[1,0]))
print("trading data: ", trading_days[1,0])
>>>trading data type: <class 'pandas._libs.tslibs.timestamps.Timestamp'>
trading data: 2011-09-20 13:30:00+00:00
#now going to loop through the nyse calendar, convert to string and return in new numpy array
#date, open time, close time
exchng_cal = np.empty((trading_days.shape[0],3),dtype=str)
for i in range(trading_days.shape[0]-1):
temp_str_open = str(trading_days[i,0])
print(temp_str_open)
temp_str_close = str(trading_days[i,1])
print(temp_str_close)
#date
exchng_cal[i,0] = temp_str_open.split()[0]
print(temp_str_open.split()[0])
#open time
exchng_cal[i,1] = temp_str_open.split()[1].split('+')[0]
print(temp_str_open.split()[1].split('+')[0])
#close time
exchng_cal[i,2] = temp_str_close.split()[1].split('+')[0]
print(temp_str_close.split()[1].split('+')[0])
print(exchng_cal)
>>>2019-12-04 14:30:00+00:00
2019-12-04 21:00:00+00:00
2019-12-04
14:30:00
21:00:00
[['2' '1' '2']
['2' '1' '2']
['2' '1' '2']
...
['2' '1' '2']
['2' '1' '2']
['' '' '']]
我縮短了最后的打印輸出,但是正如您在打印單個元素時所看到的那樣,它們使用正確的值打印,但是當我打印exchng_cal
它返回['2','1','2']
。
在 numpy 中,您必須指定字符串長度(請參閱np.chararray
)。 默認值為 1,因此您的值會被截斷。 因為您的數據結構需要不同長度的字符串,所以這可能是一個解決方案:
exchng_cal = np.empty((trading_days.shape[0],3),dtype='object')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.