[英]LSHModel on spark structured streaming
Apparently, the LSHModel of MLLib from spark 2.4 supports Spark Structured Streaming ( https://issues.apache.org/jira/browse/SPARK-24465 ).显然,来自 spark 2.4 的 MLLib 的 LSHModel 支持 Spark Structured Streaming ( https://issues.apache.org/jira/browse/SPARK-24465 )。
However, it's not clear to me how.但是,我不清楚如何。 For instance an
approxSimilarityJoin
from MinHashLSH
transformation (https://spark.apache.org/docs/latest/ml-features#lsh-operations ) could be applied directly to a streaming dataframe?例如,来自
approxSimilarityJoin
转换(https://spark.apache.org/docs/latest/ml-features#lsh-operations )的MinHashLSH
可以直接应用于流式 Z6A0550054B5DF47C5554
I don't find more information online about it.我在网上找不到更多关于它的信息。 Could someone help me?
有人可以帮助我吗?
You need to你需要
modelFitted
) somewhere accessible to your Streaming job.modelFitted
)保存在您的流式处理作业可访问的某个地方。 This is done outside of your streaming job.modelFitted.write.overwrite().save("/path/to/model/location")
import org.apache.spark.ml._
val model = PipelineModel.read.load("/path/to/model/location")
df
) withdf
)model.transform(df)
// in your case you may work with two streaming Dataframes to apply `approxSimilarityJoin`.
It might be required to get the streaming Dataframe into the correct format to be used in the model prediction.可能需要将流 Dataframe 转换为正确的格式,以便在 model 预测中使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.