Objective: Continuously feeding sniffed network packages into a Kafka Producer, connecting this to Spark Streaming to be able to process package data, After that, using the preprocessed data in Tensorflow or Keras.
I'm processing continuous data in Spark Streaming (PySpark) which comes from Kafka and now I want to send processed data to Tensorflow. How can I use these Transformed DStreams in Tensorflow with Python? Thanks.
Currently no processing applied in Spark Streaming but will be added later. Here's the py code:
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.conf import SparkConf
from datetime import datetime
if __name__ == '__main__':
sc = SparkContext(appName='Kafkas')
ssc = StreamingContext(sc, 2)
brokers, topic = sys.argv[1:]
kvs = KafkaUtils.createDirectStream(ssc, [topic],
{'metadata.broker.list': brokers})
lines = kvs.map(lambda x: x[1])
lines.pprint()
ssc.start()
ssc.awaitTermination()
Also I use this to start spark streaming:
spark-submit --packages org.apache.spark:spark-streaming-kafka-0–8_2.11:2.0.0
spark-kafka.py localhost:9092 topic
You have two ways to solve your problem :
Once your processed your data, you can save them, then independently run your model (in Keras ?). Just create a parquet file / append to it if it already exists :
if os.path.isdir(DATA_TREATED_PATH): data.write.mode('append').parquet(DATA_TREATED) else: data.write.parquet(DATA_TREATED_PATH)
And then you just create your model with keras / tensorflow and you run it like every hour maybe ? Or as many time as you want it to be updated. So this is run from scratch everytime.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.