简体   繁体   中英

How to integrate Spark Streaming with Tensorflow?

Objective: Continuously feeding sniffed network packages into a Kafka Producer, connecting this to Spark Streaming to be able to process package data, After that, using the preprocessed data in Tensorflow or Keras.

I'm processing continuous data in Spark Streaming (PySpark) which comes from Kafka and now I want to send processed data to Tensorflow. How can I use these Transformed DStreams in Tensorflow with Python? Thanks.

Currently no processing applied in Spark Streaming but will be added later. Here's the py code:

import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.conf import SparkConf
from datetime import datetime

if __name__ == '__main__':
    sc = SparkContext(appName='Kafkas')
    ssc = StreamingContext(sc, 2)
    brokers, topic = sys.argv[1:]
    kvs = KafkaUtils.createDirectStream(ssc, [topic], 
                                       {'metadata.broker.list': brokers})
    lines = kvs.map(lambda x: x[1])
    lines.pprint()
    ssc.start()
    ssc.awaitTermination()

Also I use this to start spark streaming:

spark-submit --packages org.apache.spark:spark-streaming-kafka-0–8_2.11:2.0.0 
spark-kafka.py localhost:9092 topic

You have two ways to solve your problem :

  1. Once your processed your data, you can save them, then independently run your model (in Keras ?). Just create a parquet file / append to it if it already exists :

     if os.path.isdir(DATA_TREATED_PATH): data.write.mode('append').parquet(DATA_TREATED) else: data.write.parquet(DATA_TREATED_PATH) 

And then you just create your model with keras / tensorflow and you run it like every hour maybe ? Or as many time as you want it to be updated. So this is run from scratch everytime.

  1. You process your data, save them as before but after that, you load you model, train your new data / new batch and then save your model. This is called Online Learning because you don't run your model from scratch.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM