简体   繁体   English

火花结构化流的 LSHModel

[英]LSHModel on spark structured streaming

Apparently, the LSHModel of MLLib from spark 2.4 supports Spark Structured Streaming ( https://issues.apache.org/jira/browse/SPARK-24465 ).显然,来自 spark 2.4 的 MLLib 的 LSHModel 支持 Spark Structured Streaming ( https://issues.apache.org/jira/browse/SPARK-24465 )。

However, it's not clear to me how.但是,我不清楚如何。 For instance an approxSimilarityJoin from MinHashLSH transformation (https://spark.apache.org/docs/latest/ml-features#lsh-operations ) could be applied directly to a streaming dataframe?例如,来自approxSimilarityJoin转换(https://spark.apache.org/docs/latest/ml-features#lsh-operations )的MinHashLSH可以直接应用于流式 Z6A0550054B5DF47C5554

I don't find more information online about it.我在网上找不到更多关于它的信息。 Could someone help me?有人可以帮助我吗?

You need to你需要

  1. Persist the trained model (eg modelFitted ) somewhere accessible to your Streaming job.将经过训练的 model (例如modelFitted )保存在您的流式处理作业可访问的某个地方。 This is done outside of your streaming job.这是在您的流媒体作业之外完成的。
modelFitted.write.overwrite().save("/path/to/model/location")
  1. Then load this model within you Structured Streaming job然后在结构化流作业中加载这个 model
import org.apache.spark.ml._
val model = PipelineModel.read.load("/path/to/model/location")
  1. Apply this model to your streaming Dataframe (eg df ) with将此 model 应用到您的流媒体 Dataframe (例如df
model.transform(df)

// in your case you may work with two streaming Dataframes to apply `approxSimilarityJoin`.

It might be required to get the streaming Dataframe into the correct format to be used in the model prediction.可能需要将流 Dataframe 转换为正确的格式,以便在 model 预测中使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM