简体   繁体   English

将 numpy 数组添加到 pandas df

[英]add numpy array to pandas df

Im experimenting with time series predictions something like this:我正在试验这样的时间序列预测:

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(data.values, 
                order=order, 
                seasonal_order=seasonal_order)

result = model.fit()

train = data.sample(frac=0.8,random_state=0)
test = data.drop(train.index)
start = len(train)
end = len(train) + len(test) - 1
  
# Predictions for one-year against the test set
predictions = result.predict(start, end,
                             typ='levels')

where predictions is a numpy array.其中 predictions 是一个 numpy 数组。 How do I add this to my test pandas df?如何将此添加到我的test pandas df? If I try this: test['predicted'] = predictions.tolist()如果我试试这个: test['predicted'] = predictions.tolist()

This wont contact properly where I was hoping to add in the prediction as another column in my df.这不会在我希望将预测添加为我的 df 中的另一列的地方正确联系。 It looks like this below:它看起来像下面这样:

hour
2021-06-07 17:00:00                                          75726.57143
2021-06-07 20:00:00                                          62670.06667
2021-06-08 00:00:00                                             16521.65
2021-06-08 14:00:00                                              71628.1
2021-06-08 17:00:00                                          62437.16667
                                             ...                        
2021-09-23 22:00:00                                          7108.533333
2021-09-24 02:00:00                                              13325.2
2021-09-24 04:00:00                                          13322.31667
2021-09-24 13:00:00                                             37941.65
predicted              [13605.31231433516, 12597.907337725523, 13484....  <--- not coming in as another df column

Would anyone have any advice?有人有什么建议吗? Am hoping to ultimately plot the predicted values against the test values as well as calculate rsme maybe something like:我希望最终 plot 预测值与测试值的对比以及计算 rsme 可能是这样的:

from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse

# Calculate root mean squared error
rmse(test, predictions)
  
# Calculate mean squared error
mean_squared_error(test, predictions)

EDIT编辑

train = data.sample(frac=0.8,random_state=0)
test = data.drop(train.index)

start = len(train)
end = len(train) + len(test) - 1

You should be able to add it as a column directly without needing to do any additional conversion.您应该能够直接将其添加为列,而无需进行任何额外的转换。 The output from result.predict() should be a Pandas series.来自result.predict()的 output 应该是一个 Pandas 系列。 If not, you should still be able to simply add it directly to the dataframe so long as it's the same length and order.如果不是,只要长度和顺序相同,您仍然可以直接将它添加到 dataframe。

test = pd.DataFrame({'date': ['01-01-2020', '01-02-2020', '01-03-2020', '01-04-2020', '01-05-2020'],
                     'value': [15, 25, 35, 45 ,55]}
                   )
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')

predictions = np.array([10,20,30,40,50])

test['predictions'] = predictions

Output: Output:

            value  predictions
date                          
2020-01-01     15           10
2020-01-02     25           20
2020-01-03     35           30
2020-01-04     45           40
2020-01-05     55           50

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM