[英]Using Linear Regression for Yearly distributed Time Series Data to get predictions after -N- years
I am stuck with a very unique problem.我遇到了一个非常独特的问题。 I have Time Series Data where the data is given from the years 2009 to 2018. Problem is that I am to answer a very weird question using this data.我有时间序列数据,其中的数据是从 2009 年到 2018 年。问题是我要使用这些数据回答一个非常奇怪的问题。
Data sheets contains the energy generation statistics of each Australian State/Territory in GWh ( Gigawatt hours) for the year 2009 to 2018.数据表包含 2009 年至 2018 年澳大利亚每个州/地区的发电量统计数据,以 GWh(千兆瓦时)为单位。
There are following fields:有以下字段:
State: Names of different Australian states.
Fuel_Type: The type of fuel which is consumed.
Category: Determines whether a fuel is considered as a renewable or nonrenewable.
Years: Years which the energy consumptions are recorded.
Problem :问题:
How can I use a linear regression model to predict what percentage of a state X
say Victoria’s energy generation will come from y source
say Renewable energy sources in the year Z
suppose 2100 ?我如何使用线性回归 model 来预测state X
说维多利亚的能源发电将来自y source
的百分比,比如假设2100 year Z
的可再生能源?
How am I suppose to use a Linear Regression Model to solve the problem?我应该如何使用线性回归 Model 来解决问题? This problem is beyond my reach.这个问题超出了我的能力范围。
I think first you need to think about what your model should look like at the end: You probably want something that relates the dependent variable y
(fraction of renewable energy) to your input features.我认为首先您需要考虑您的 model 最后应该是什么样子:您可能想要将因变量y
(可再生能源的比例)与您的输入特征相关联的东西。 And one of those features should probably be the year since you are interest in predicting how y
changes if you vary this quantity.其中一个特征可能应该是年份,因为如果你改变这个数量,你有兴趣预测y
如何变化。 So a very basic linear model could be y = beta1 * x + beta0
with x
being the year, beta1
and beta0
being the parameters you want to fit and y
being the fraction of renewable energy.因此,一个非常基本的线性 model 可能是y = beta1 * x + beta0
,其中x
是年份, beta1
和beta0
是您想要拟合的参数, y
是可再生能源的比例。 This of course ignores the state component, but I think a simple start could be to fit such a model to the state you are interested in. The code for such an approach could look like this:这当然忽略了 state 组件,但我认为一个简单的开始可能是将这样的 model 安装到 state 您感兴趣的代码如下:
import matplotlib
matplotlib.use("agg")
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
from scipy.stats import linregress
import numpy as np
def fracRenewable(df):
return np.sum(df.loc[df["Category"] == "Renewable fuels", "amount"]/np.sum(df["amount"]))
# load in data
data = pd.read_csv("./energy_data.csv")
# convert data to tidy format and rename columns
molten = pd.melt(data, id_vars=["State", "Fuel_Type", "Category"])
.rename(columns={"variable": "year", "value": "amount"})
# calculate fraction of renewable fuel per year
grouped = molten.groupby(["year"]).apply(fracRenewable)
.reset_index()
.rename(columns={0: "amount"})
grouped["year"] = grouped["year"].astype(int)
# >>> grouped
# year amount
# 0 2009 0.029338
# 1 2010 0.029207
# 2 2011 0.032219
# 3 2012 0.053738
# 4 2013 0.061332
# 5 2014 0.066198
# 6 2015 0.069404
# 7 2016 0.066531
# 8 2017 0.074625
# 9 2018 0.077445
# fit linear model
slope, intercept, r_value, p_value, std_err = linregress(grouped["year"], grouped["amount"])
# plot result
f, ax = plt.subplots()
sbn.scatterplot(x="year", y="amount", ax=ax, data=grouped)
ax.plot(range(2009, 2030), [i*slope + intercept for i in range(2009, 2030)], color="red")
ax.set_title("Renewable fuels (simple predicion)")
ax.set(ylabel="Fraction renewable fuel")
f.savefig("test11.png", bbox_inches="tight")
This gives you a (very simple) model to predict the fraction of renewable fuels at a given year.这为您提供了一个(非常简单的)model 来预测给定年份的可再生燃料比例。
If you want to refine the model further, I think a good start could be to group states together based on how similar they are (either based on prior knowledge or a clustering approach) and then do the predictions on those groups.如果您想进一步改进 model,我认为一个好的开始可能是根据它们的相似程度(基于先验知识或聚类方法)将状态组合在一起,然后对这些组进行预测。
Yes you can use linear regression for forecasting.是的,您可以使用线性回归进行预测。 There are different ways of how to use linear regression for forecasting.如何使用线性回归进行预测有不同的方法。 You can你可以
Here's what (1) and (2) looks like with sktime (disclaimer: I'm one of the developers):以下是sktime的 (1) 和 (2) 的样子(免责声明:我是开发人员之一):
import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.utils.plotting.forecasting import plot_ys
from sktime.forecasting.compose import ReducedRegressionForecaster
from sklearn.linear_model import LinearRegression
y = load_airline() # load 1-dimensional time series
y_train, y_test = temporal_train_test_split(y)
# here I forecast all observations of the test series,
# in your case you could only select the years you're interested in
fh = np.arange(1, len(y_test) + 1)
# option 1
forecaster = PolynomialTrendForecaster(degree=1)
forecaster.fit(y_train)
y_pred_1 = forecaster.predict(fh)
# option 2
forecaster = ReducedRegressionForecaster(LinearRegression(), window_length=10)
forecaster.fit(y_train)
y_pred_2 = forecaster.predict(fh)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.