[英]How to integrate Spark Streaming with Tensorflow?
Objective: Continuously feeding sniffed network packages into a Kafka Producer, connecting this to Spark Streaming to be able to process package data, After that, using the preprocessed data in Tensorflow or Keras. 目标:持续将嗅探到的网络程序包馈入Kafka Producer,并将其连接到Spark Streaming以能够处理程序包数据,然后,使用Tensorflow或Keras中的预处理数据。
I'm processing continuous data in Spark Streaming (PySpark) which comes from Kafka and now I want to send processed data to Tensorflow. 我正在使用来自Kafka的Spark Streaming(PySpark)处理连续数据,现在我想将处理后的数据发送到Tensorflow。 How can I use these Transformed DStreams in Tensorflow with Python?
如何使用Python在Tensorflow中使用这些转换的DStream? Thanks.
谢谢。
Currently no processing applied in Spark Streaming but will be added later. 目前在Spark Streaming中未应用任何处理,但稍后会添加。 Here's the py code:
这是py代码:
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.conf import SparkConf
from datetime import datetime
if __name__ == '__main__':
sc = SparkContext(appName='Kafkas')
ssc = StreamingContext(sc, 2)
brokers, topic = sys.argv[1:]
kvs = KafkaUtils.createDirectStream(ssc, [topic],
{'metadata.broker.list': brokers})
lines = kvs.map(lambda x: x[1])
lines.pprint()
ssc.start()
ssc.awaitTermination()
Also I use this to start spark streaming: 我也用它来启动火花流:
spark-submit --packages org.apache.spark:spark-streaming-kafka-0–8_2.11:2.0.0
spark-kafka.py localhost:9092 topic
You have two ways to solve your problem : 您有两种方法可以解决问题:
Once your processed your data, you can save them, then independently run your model (in Keras ?). 处理完数据后,您可以保存它们,然后独立运行模型(在Keras中)。 Just create a parquet file / append to it if it already exists :
只需创建一个实木复合地板文件/如果已经存在,则追加到该文件:
if os.path.isdir(DATA_TREATED_PATH): data.write.mode('append').parquet(DATA_TREATED) else: data.write.parquet(DATA_TREATED_PATH)
And then you just create your model with keras / tensorflow and you run it like every hour maybe ? 然后,您仅使用keras / tensorflow创建模型,并可能像每小时一样运行它? Or as many time as you want it to be updated.
或您想要更新的时间。 So this is run from scratch everytime.
因此,这是从头开始的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.