简体   繁体   English

从R中的GBM中提取Model

[英]Extracting Model from GBM in R

is anyone familiar with how to figure out what's going on inside a gbm model in R?有没有人熟悉如何弄清楚 R 中的gbm model 内部发生了什么?

Let's say we wanted to see how to predict the Petal.Length in iris.假设我们想了解如何预测虹膜中的Petal.Length Just to keep it simple I ran:为了简单起见,我跑了:

tg=gbm(Petal.Length~.,data=iris)

This works and when you run:这有效,当你运行时:

summary(tg)

Then you get:然后你得到:

Hit <Return> to see next plot: 
                      var rel.inf
Petal.Width   Petal.Width   67.39
Species           Species   32.61
Sepal.Length Sepal.Length    0.00
Sepal.Width   Sepal.Width    0.00

This makes sense intuitively.这在直觉上是有道理的。 When you run pretty.gbm.tree(tg) You get:当你运行pretty.gbm.tree(tg)你会得到:

  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight    Prediction
0        2  0.8000000000        1         2           3       184.6764     75  0.0001366667
1       -1 -0.0022989091       -1        -1          -1         0.0000     22 -0.0022989091
2       -1  0.0011476604       -1        -1          -1         0.0000     53  0.0011476604
3       -1  0.0001366667       -1        -1          -1         0.0000     75  0.0001366667

So clearly gbm thinks that you split by Variable #2 and get back three separate regressions.很明显,gbm 认为您按变量 #2 拆分并返回三个独立的回归。 I assume that SplitVar==2 is Petal.Width since the order you see in str(iris) makes sense.我假设SplitVar==2Petal.Width ,因为您在str(iris)中看到的顺序是有道理的。

But what data is missing?但是缺少什么数据? iris has no missing data. iris没有缺失数据。 And then how do we see what is going on in each of the three nodes that were created?然后我们如何查看创建的三个节点中的每一个节点中发生了什么?

Let's say I wanted to code this up in C++ for production, I don't get how one would know what to code beyond knowing that you should do something differently depending on if Petal.Width >.8 .假设我想在 C++ 中编写代码用于生产,除了知道您应该根据Petal.Width >.8做一些不同的事情之外,我不知道如何编写代码。

Thanks,谢谢,

Josh乔什

See the function gbm2sas in the package mlmeta , which uses metaprogramming to convert the R object to SAS format. 请参阅软件包mlmeta中的函数gbm2sas ,该函数使用元编程将R对象转换为SAS格式。

The SAS format is similar to C++, so it is both easy to read and easy hack to C++. SAS格式类似于C ++,因此对C ++既易于阅读又易于破解。

generate paths in the same chart as the QTD Price chart在与 QTD 价格图表相同的图表中生成路径

paths_df = pd.DataFrame(data=paths[:, :10],
                        index=pd.date_range(start="2022-08-19", periods = 11, freq="B"))

ax = AAPL["2022-07":].plot()
paths_df.plot(ax=ax,
              legend=False,
              title="AAPL",
              ylabel="Price");`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM