r^2
."y ~ 1 + " + " + ".join("I(x**{})".format(i) for i in range(1, degree+1))
,1 +
constant needed and, if so, what should the constant value be?import numpy
import pandas
import matplotlib
import matplotlib.offsetbox
import statsmodels.tools
import statsmodels.formula.api
data = numpy.array([
[1999, 197.0],
[2000, 196.5],
[2001, 194.3],
[2002, 193.7],
[2003, 192.0],
[2004, 189.2],
[2005, 189.3],
[2006, 187.6],
[2007, 186.9],
[2008, 186.0],
[2009, 185.0],
[2010, 186.2],
[2011, 185.1],
[2012, 185.6],
[2013, 185.0],
[2014, 185.6],
[2015, 185.4],
[2016, 185.1],
[2017, 183.9],
])
df = pandas.DataFrame(data, columns=["Year", "CrudeRate"])
cause = "Malignant neoplasms"
x = df["Year"].values
y = df["CrudeRate"].values
degree = 2
predict_future_years = 5
# https://stackoverflow.com/a/34617603/4135310
olsdata = {"x": x, "y": y}
formula = "y ~ 1 + " + " + ".join("I(x**{})".format(i) for i in range(1, degree+1))
model = statsmodels.formula.api.ols(formula, olsdata).fit()
print(model.summary())
ax = df.plot("Year", "CrudeRate", kind="scatter", grid=True, title="Deaths from {}".format(cause))
# https://stackoverflow.com/a/37294651/4135310
func = numpy.poly1d(model.params.values[::-1])
matplotlib.pyplot.plot(df["Year"], func(df["Year"]))
predicted = func(df.Year.values[-1] + predict_future_years)
print("Predicted in {} years: {}".format(predict_future_years, predicted))
ax.add_artist(matplotlib.offsetbox.AnchoredText("$\\barR^2$ = {:0.2f}".format(model.rsquared_adj), loc="upper center"))
ax.add_artist(matplotlib.offsetbox.AnchoredText("Predicted in +{} = {:0.2f}".format(predict_future_years, predicted), loc="upper right"))
ax.xaxis.set_major_formatter(matplotlib.ticker.FormatStrFormatter("%d"))
fig = matplotlib.pyplot.gcf()
fig.autofmt_xdate(bottom=0.2, rotation=30, ha="right", which="both")
matplotlib.pyplot.tight_layout()
cleaned_title = cause.replace(" ", "_").replace("(", "").replace(")", "")
#matplotlib.pyplot.savefig("{}_{}.png".format(cleaned_title, degree), dpi=100)
matplotlib.pyplot.show()
Based on comments from @ALollz, when using Patsy
notation (eg statsmodels.formula.api.ols("y ~ x")
), you don't need to include 1 +
, as the constant is added by default to the model, although this does not specify that your model has a constant that takes on the value of 1. Instead, it specifies that you have a constant whose magnitude will be given by the intercept coefficient. This is the constant determined by OLS, so it's the one you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.