火花结构化流的 LSHModel

Question

Apparently, the LSHModel of MLLib from spark 2.4 supports Spark Structured Streaming ( https://issues.apache.org/jira/browse/SPARK-24465 ).显然，来自 spark 2.4 的 MLLib 的 LSHModel 支持 Spark Structured Streaming ( https://issues.apache.org/jira/browse/SPARK-24465 )。

However, it's not clear to me how.但是，我不清楚如何。 For instance an approxSimilarityJoin from MinHashLSH transformation (https://spark.apache.org/docs/latest/ml-features#lsh-operations ) could be applied directly to a streaming dataframe?例如，来自approxSimilarityJoin转换（https://spark.apache.org/docs/latest/ml-features#lsh-operations ）的MinHashLSH可以直接应用于流式 Z6A0550054B5DF47C5554

I don't find more information online about it.我在网上找不到更多关于它的信息。 Could someone help me?有人可以帮助我吗？

Answer 1

You need to你需要

Persist the trained model (eg modelFitted ) somewhere accessible to your Streaming job.将经过训练的 model （例如modelFitted ）保存在您的流式处理作业可访问的某个地方。 This is done outside of your streaming job.这是在您的流媒体作业之外完成的。

modelFitted.write.overwrite().save("/path/to/model/location")

Then load this model within you Structured Streaming job然后在结构化流作业中加载这个 model

import org.apache.spark.ml._
val model = PipelineModel.read.load("/path/to/model/location")

Apply this model to your streaming Dataframe (eg df ) with将此 model 应用到您的流媒体 Dataframe （例如df ）

model.transform(df)

// in your case you may work with two streaming Dataframes to apply `approxSimilarityJoin`.

It might be required to get the streaming Dataframe into the correct format to be used in the model prediction.可能需要将流 Dataframe 转换为正确的格式，以便在 model 预测中使用。

火花结构化流的 LSHModel

问题描述

1 个解决方案

解决方案1
0 2021-03-02 19:05:03

火花结构化流的 LSHModel

问题描述

1 个解决方案

解决方案1 0 2021-03-02 19:05:03

解决方案1
0 2021-03-02 19:05:03