[英]How do I extract feature_importances from my model in SparklyR?
我想從 SparklyR 中的 model 中提取feature_importances
。 到目前為止,我有以下正在運行的可重現代碼:
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input_col = "text", output_col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
model <- ml_fit(pipeline, dtrain_spark)
當我嘗試運行下面的ml_stage
步驟時,我發現我無法提取feature_importances
的向量,而是一個 function。之前的帖子(如何在 Sparklyr 中提取特征重要性? )將其顯示為一個向量,我會喜歡得到。 我的錯誤可能是什么? 我需要采取其他步驟來打開 function 並在此處獲取值向量嗎?
ml_stage(model, 3)$feature_importances
這是我的 output 到ml_stage
的樣子(而不是值向量):
function (...)
{
tryCatch(.f(...), error = function(e) {
if (!quiet)
message("Error: ", e$message)
otherwise
}, interrupt = function(e) {
stop("Terminated by user", call. = FALSE)
})
}
<bytecode: 0x559a0d438278>
<environment: 0x559a0ce8e840>
我不確定這是否是您想要的,但可以結合向量化器 model 和詞匯來提取 model 的feature_importances
,這將生成一個包含文本重要性的表格。 您可以使用以下代碼:
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input_col = "text", output_col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
model <- ml_fit(pipeline, dtrain_spark)
tibble(
token = unlist(ml_stage(model, 'count_vectorizer')$vocabulary),
importance = ml_stage(model, 'decision_tree_classifier')$feature_importances
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.