簡體   English   中英

將數據從一個 Numpy 數組移動到另一個返回不正確的數據

[英]Moving data from one Numpy array to another returning incorrect data

我正在使用pandas_market_calendars獲取紐約證券交易所的市場日歷。 我將市場日歷設置為變量nyse_calendar並注意它返回的數據類型是<class 'pandas_market_calendars.exchange_calendar_nyse.NYSEExchangeCalendar'> 這沒有幫助,因為我的主數據文件存儲在數據類型為<class 'numpy.str_'>的 numpy 數組中。 所以我使用.to_numpy()nyse_calendar轉換為 numpy 數組。 當我打印 now 的數據類型時,它返回<class 'pandas._libs.tslibs.timestamps.Timestamp'> ,同樣沒有用,因為我想將主數據文件中的日期和時間與此日歷進行比較。

因此,當我從trading_days數組打印一個時,它會在將其轉換為字符串后返回2011-09-20 13:30:00+00:00

所以,我想要做的是循環遍歷trading_days (numpy 數組),並使用.split()的組合將值轉換為字符串值。 以下是獲得更好上下文的代碼:

import numpy as np
import pandas_market_calendars as mkt_cal
from datetime import datetime
import pandas as pd

#set up the NYSE trading calendar
#create new market calendar
nyse_calendar = mkt_cal.get_calendar('NYSE')

#create a dataframe with only trading days - includes early closes
#needs to be from beginning of testing to end of testing data
nyse_schedule = nyse_calendar.schedule(start_date='2011-09-18', end_date='2019-12-05')

#convert dataframe to a numpy array
#reference: trading_days[0,0], trading_days[1,0] etc.
#open date & time in col 0, close date & time in col 1
trading_days = nyse_schedule.to_numpy()

print(trading_days)
>>>[[Timestamp('2011-09-19 13:30:00+0000', tz='UTC')
  Timestamp('2011-09-19 20:00:00+0000', tz='UTC')]
 [Timestamp('2011-09-20 13:30:00+0000', tz='UTC')
  Timestamp('2011-09-20 20:00:00+0000', tz='UTC')]
 [Timestamp('2011-09-21 13:30:00+0000', tz='UTC')
  Timestamp('2011-09-21 20:00:00+0000', tz='UTC')]
 ...
 [Timestamp('2019-12-03 14:30:00+0000', tz='UTC')
  Timestamp('2019-12-03 21:00:00+0000', tz='UTC')]
 [Timestamp('2019-12-04 14:30:00+0000', tz='UTC')
  Timestamp('2019-12-04 21:00:00+0000', tz='UTC')]
 [Timestamp('2019-12-05 14:30:00+0000', tz='UTC')
  Timestamp('2019-12-05 21:00:00+0000', tz='UTC')]]

print("trading data type: ",type(trading_days[1,0]))
print("trading data: ", trading_days[1,0])
>>>trading data type:  <class 'pandas._libs.tslibs.timestamps.Timestamp'>
trading data:  2011-09-20 13:30:00+00:00

#now going to loop through the nyse calendar, convert to string and return in new numpy array
#date, open time, close time
exchng_cal = np.empty((trading_days.shape[0],3),dtype=str)

for i in range(trading_days.shape[0]-1):
    temp_str_open = str(trading_days[i,0])
    print(temp_str_open)
    temp_str_close = str(trading_days[i,1])
    print(temp_str_close)
    #date
    exchng_cal[i,0] = temp_str_open.split()[0]
    print(temp_str_open.split()[0])
    #open time
    exchng_cal[i,1] = temp_str_open.split()[1].split('+')[0]
    print(temp_str_open.split()[1].split('+')[0])
    #close time
    exchng_cal[i,2] = temp_str_close.split()[1].split('+')[0]
    print(temp_str_close.split()[1].split('+')[0])

print(exchng_cal)
>>>2019-12-04 14:30:00+00:00
2019-12-04 21:00:00+00:00
2019-12-04
14:30:00
21:00:00
[['2' '1' '2']
 ['2' '1' '2']
 ['2' '1' '2']
 ...
 ['2' '1' '2']
 ['2' '1' '2']
 ['' '' '']]

我縮短了最后的打印輸出,但是正如您在打印單個元素時所看到的那樣,它們使用正確的值打印,但是當我打印exchng_cal它返回['2','1','2']

在 numpy 中,您必須指定字符串長度(請參閱np.chararray )。 默認值為 1,因此您的值會被截斷。 因為您的數據結構需要不同長度的字符串,所以這可能是一個解決方案:

exchng_cal = np.empty((trading_days.shape[0],3),dtype='object')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM