Future of Iterable to run sequentially

Question

Code with explanation:

val partitions = preparePartitioningDataset(dataset, "sdp_id").map { partitions =>
  val resultPartitionedDataset: Iterator[Future[Iterable[String]]] = for {
    partition <- partitions
  } yield {
    val whereStatement = s"SDP_ID = '$partition'"
    val partitionedDataset =
      datasetService.getFullDatasetResultIterable(
        dataset = dataset,
        format = format._1,
        limit = none[Int],
        where = whereStatement.some
      )
    partitionedDataset
  }
  
  resultPartitionedDataset
}

partitions.map { partitionedDataset =>
  for {
    partition <- partitionedDataset
  } notifyPartitionedDataset(
    bearerToken = bearerToken,
    endpoint = endpoint,
    dataset = partition
  )
}

So now

preparePartitioningDataset(dataset, "sdp_id") returns a Future[Iterator[String]]
datasetService.getFullDatasetResultIterable returns itself also a Future[Iterable[String]]
Pretty much you see that resultPartitionedDataset is an Iterator[Future[Iterable[String]]]
and Finally notifyPartitionedDataset returns a Future[Unit]

About some explanation of what's happening and what I'm trying to achieve

I have preparePartioningDataset that performs a Select Distinct on a single value, giving back a Future[ResultSet] (mapped to an Iterator). This because for each single value I want to perform a SELECT * WHERE column=that_value . This happens on getFullDatasetResultIterable , again a Future[ResultSet] mappet to an Iterator as well.

Last step is to forward via a POST, every single query I got. It works, but everything happens in parallel (well I guess that's why I wanted to go for a Future), but now I got required that each POST ( notifyPartionedDataset ) happens sequentially, so to send a post after another and not in parallel.

I've tried a lot of different approaches but I still get the same outcome.

How could I move forward?

Answer 1

You can take advantage of the laziness of the IO datatype to ensure that some operations are executed in order.

import cats.effect.IO

import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

def preparePartitioningDatasetIO(dataset: String, foo: String): IO[List[String]] =
  IO.fromFuture(IO(
    preparePartitioningDataset(dataset, foo))
  )).map(_.toList)

def getFullDatasetResultIterableIO(dataset: String, format: String, limit: Option[Int], where: Option[String]): IO[List[String]] =
  IO.fromFuture(IO(
    datasetService.getFullDatasetResultIterable(
      dataset,
      format,
      limit,
      where
    )
  ))
  
def notifyPartitionedDatasetIO(bearerToken: String, endpoint: String, dataset: List[String]): IO[Unit] =
  IO.fromFuture(IO(
    notifyPartitionedDataset(
      bearerToken,
      endpoint,
      dataset
    )
  ))
  
def program(dataset: String): IO[Unit] =
  preparePartitioningDatasetIO(dataset, "sdp_id").flatMap { partitions =>
    partitions.traverse_ { partition =>
      val whereStatement = s"SDP_ID = '$partition'"
      getFullDatasetResultIterableIO(
        dataset = dataset,
        format = format._1,
        limit = none,
        where = whereStatement.some
      ).flatMap { dataset =>
        notifyPartitionedDatasetIO(
          bearerToken = bearerToken,
          endpoint = endpoint,
          dataset = dataset
        )
      }
    }
  }


def run(dataset: String): Future[Unit] = {
  import cats.effect.unsafe.implicits.global
  program(dataset).unsafeToFuture()
}

The code needs to be carefully reviewed and fixed, especially the arguments of the functions.
But, this should help to get the result you want without needing to refactor the whole codebase; yet.

If you want getFullDatasetResultIterableIO to run in parallel while notifyPartitionedDatasetIO to run serially you can do this:

def program(dataset: String): IO[Unit] =
  preparePartitioningDatasetIO(dataset, "sdp_id").flatMap { partitions =>
    partitions.parTraverse { partition =>
      val whereStatement = s"SDP_ID = '$partition'"
      getFullDatasetResultIterableIO(
        dataset = dataset,
        format = format._1,
        limit = none,
        where = whereStatement.some
      )
    } flatMap { datasets =>
      datasets.traverse_ { dataset =>
        notifyPartitionedDatasetIO(
          bearerToken = bearerToken,
          endpoint = endpoint,
          dataset = dataset
        )
      }
    }
  }

Although this would imply that the whole data is kept in memory before starting to notify.

Future of Iterable to run sequentially

Question

1 answers

solution1
0 2021-11-26 13:54:58

Future of Iterable to run sequentially

Question

1 answers

solution1 0 2021-11-26 13:54:58

solution1
0 2021-11-26 13:54:58