简体   繁体   English

如何在Spark Streaming中使用无限的Scala流作为源?

[英]How do I use infinite Scala streams as source in Spark Streaming?

Suppose I essentially want Stream.from(0) as InputDStream . 假设我基本上希望Stream.from(0)作为InputDStream How would I go about this? 我该怎么做? The only way I can see is to use StreamingContext#queueStream , but I'd have to either enqueue elements from another thread or subclass Queue to create a queue that behaves like an infinite stream, both of which feel like a hack. 我能看到的唯一方法是使用StreamingContext#queueStream ,但是我必须从另一个线程或子类Queue排队元素以创建一个行为类似于无限流的队列,这两者都感觉像是一个黑客。

What's the correct way to do this? 这样做的正确方法是什么?

I don't think that it's available in Spark by default but it's easy to implement it with ReceiverInputDStream. 我不认为默认情况下它在Spark中可用,但使用ReceiverInputDStream很容易实现它。

import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver

class InfiniteStreamInputDStream[T](
       @transient ssc_ : StreamingContext,
       stream: Stream[T],
       storageLevel: StorageLevel
      ) extends ReceiverInputDStream[T](ssc_)  {

  override def getReceiver(): Receiver[T] = {
    new InfiniteStreamReceiver(stream, storageLevel)
  }
}

class InfiniteStreamReceiver[T](stream: Stream[T], storageLevel: StorageLevel) extends Receiver[T](storageLevel) {

  // Stateful iterator
  private val streamIterator = stream.iterator

  private class ReadAndStore extends Runnable {
    def run(): Unit = {
      while (streamIterator.hasNext) {
        val next = streamIterator.next()
        store(next)
      }
    }
  }

  override def onStart(): Unit = {
    new Thread(new ReadAndStore).run()    
  }

  override def onStop(): Unit = { }
}

Slightly modified code tat works with Spark 2.0: 稍微修改过的代码与Spark 2.0一起使用:

import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
import scala.reflect.ClassTag

class InfiniteDStream[T: ClassTag](
                                    @transient ssc_ : StreamingContext,
                                    stream: Stream[T],
                                    storageLevel: StorageLevel
                                  ) extends ReceiverInputDStream[T](ssc_) {

  override def getReceiver(): Receiver[T] = {
    new InfiniteStreamReceiver(stream, storageLevel)
  }
}

class InfiniteStreamReceiver[T](stream: Stream[T], storageLevel: StorageLevel) extends Receiver[T](storageLevel) {

  private class ReadAndStore extends Runnable {
    def run(): Unit = {
      stream.foreach(store)
    }
  }

  override def onStart(): Unit = {
    val t = new Thread(new ReadAndStore)
    t.setDaemon(true)
    t.run()
  }

  override def onStop(): Unit = {}
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM