简体   繁体   English

scalaz-stream中的Bucketed Sink

[英]Bucketed Sink in scalaz-stream

I am trying to make a sink that would write a stream to bucketed files: when a particular condition is reached (time, size of file, etc.) is reached, the current output stream is closed and a new one is opened to a new bucket file. 我正在尝试创建一个将流写入分段文件的接收器:当达到特定条件(时间,文件大小等)时,关闭当前输出流并打开一个新的输出流桶文件。

I checked how the different sinks were created in the io object, but there aren't many examples. 我检查了如何在io对象中创建不同的接收器,但是没有很多示例。 So I trIed to follow how resource and chunkW were written. 所以我想了解resourcechunkW的编写方式。 I ended up with the following bit of code, where for simplicity, buckets are just represented by an Int for now, but would eventually be some type output streams. 我最后得到了以下代码,为简单起见,现在只用Int表示存储桶,但最终会出现一些类型的输出流。

  val buckets: Channel[Task, String, Int] = {

    //recursion to step through the stream
    def go(step: Task[String => Task[Int]]): Process[Task, String => Task[Int]] = {

      // Emit the value and repeat
      def next(msg: String => Task[Int]) =
        Process.emit(msg) ++
          go(step)


      Process.await[Task, String => Task[Int], String => Task[Int]](step)(
        next
        , Process.halt // TODO ???
        , Process.halt) // TODO ???
    }

   //starting bucket
    val acquire: Task[Int] = Task.delay {
      val startBuck = nextBucket(0)
      println(s"opening bucket $startBuck")
      startBuck
    }

   //the write step
    def step(os: Int): Task[String => Task[Int]] =
      Task.now((msg: String) => Task.delay {
        write(os, msg)
        val newBuck = nextBucket(os)
        if (newBuck != os) {
          println(s"closing bucket $os")
          println(s"opening bucket $newBuck")
        }
        newBuck
      })

    //start the Channel
    Process.await(acquire)(
      buck => go(step(buck))
      , Process.halt, Process.halt)
  }

 def write(bucket: Int, msg: String) { println(s"$bucket\t$msg") }
 def nextBucket(b: Int) = b+1

There are a number of issues in this: 这有很多问题:

  1. step is passed the bucket once at the start and this never changes during the recursion. step在开始时被传递一次,这在递归期间永远不会改变。 I am not sure how in the recursive go to create a new step task that will use the bucket (Int) from the previous task, as I have to provide a String to get to that task. 我不知道如何在递归go创建一个新的step ,将使用桶(INT)从以前的任务,因为我必须提供一个字符串来获得该任务的任务。
  2. the fallback and cleanup of the await calls do not receive the result of rcv (if there is one). await调用的fallbackcleanup不会收到rcv的结果(如果有的话)。 In the io.resource function, it works fine as the resource is fixed, however, in my case, the resource might change at any step. io.resource函数中,它在资源修复时工作正常,但在我的情况下,资源可能在任何步骤都会发生变化。 How would I go to pass the reference to the current open bucket to these callbacks? 我如何将对当前打开的桶的引用传递给这些回调?

Well one of the options (ie time) may be to use simple go on the sink. 其中一个选项(即时间)可能是使用简单的go水槽。 This one uses time based, essentially reopening file every single hour: 这个基于时间,基本上每小时重新打开文件:

val metronome =  Process.awakeEvery(1.hour).map(true)


def writeFileSink(file:String):Sink[Task,ByteVector] = ???


def timeBasedSink(prefix:String) = {
  def go(index:Int) : Sink[Task,ByteVector] = {
    metronome.wye(write(prefix + "_" + index))(wye.interrupt) ++ go(index + 1)
  }

  go(0)
} 

for the other options (ie bytes written) you can use similar technique, just keep signal of bytes written and combine it with Sink. 对于其他选项(即写入的字节),您可以使用类似的技术,只保留写入的字节信号并将其与Sink结合使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM