Code with explanation:
val partitions = preparePartitioningDataset(dataset, "sdp_id").map { partitions =>
val resultPartitionedDataset: Iterator[Future[Iterable[String]]] = for {
partition <- partitions
} yield {
val whereStatement = s"SDP_ID = '$partition'"
val partitionedDataset =
datasetService.getFullDatasetResultIterable(
dataset = dataset,
format = format._1,
limit = none[Int],
where = whereStatement.some
)
partitionedDataset
}
resultPartitionedDataset
}
partitions.map { partitionedDataset =>
for {
partition <- partitionedDataset
} notifyPartitionedDataset(
bearerToken = bearerToken,
endpoint = endpoint,
dataset = partition
)
}
So now
preparePartitioningDataset(dataset, "sdp_id")
returns a Future[Iterator[String]]
datasetService.getFullDatasetResultIterable
returns itself also a Future[Iterable[String]]
resultPartitionedDataset
is an Iterator[Future[Iterable[String]]]
notifyPartitionedDataset
returns a Future[Unit]
About some explanation of what's happening and what I'm trying to achieve
I have preparePartioningDataset
that performs a Select Distinct
on a single value, giving back a Future[ResultSet] (mapped to an Iterator). This because for each single value I want to perform a SELECT * WHERE column=that_value
. This happens on getFullDatasetResultIterable
, again a Future[ResultSet] mappet to an Iterator as well.
Last step is to forward via a POST, every single query I got. It works, but everything happens in parallel (well I guess that's why I wanted to go for a Future), but now I got required that each POST ( notifyPartionedDataset
) happens sequentially, so to send a post after another and not in parallel.
I've tried a lot of different approaches but I still get the same outcome.
How could I move forward?
You can take advantage of the laziness of the IO
datatype to ensure that some operations are executed in order.
import cats.effect.IO
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
def preparePartitioningDatasetIO(dataset: String, foo: String): IO[List[String]] =
IO.fromFuture(IO(
preparePartitioningDataset(dataset, foo))
)).map(_.toList)
def getFullDatasetResultIterableIO(dataset: String, format: String, limit: Option[Int], where: Option[String]): IO[List[String]] =
IO.fromFuture(IO(
datasetService.getFullDatasetResultIterable(
dataset,
format,
limit,
where
)
))
def notifyPartitionedDatasetIO(bearerToken: String, endpoint: String, dataset: List[String]): IO[Unit] =
IO.fromFuture(IO(
notifyPartitionedDataset(
bearerToken,
endpoint,
dataset
)
))
def program(dataset: String): IO[Unit] =
preparePartitioningDatasetIO(dataset, "sdp_id").flatMap { partitions =>
partitions.traverse_ { partition =>
val whereStatement = s"SDP_ID = '$partition'"
getFullDatasetResultIterableIO(
dataset = dataset,
format = format._1,
limit = none,
where = whereStatement.some
).flatMap { dataset =>
notifyPartitionedDatasetIO(
bearerToken = bearerToken,
endpoint = endpoint,
dataset = dataset
)
}
}
}
def run(dataset: String): Future[Unit] = {
import cats.effect.unsafe.implicits.global
program(dataset).unsafeToFuture()
}
The code needs to be carefully reviewed and fixed, especially the arguments of the functions.
But, this should help to get the result you want without needing to refactor the whole codebase; yet.
If you want getFullDatasetResultIterableIO
to run in parallel while notifyPartitionedDatasetIO
to run serially you can do this:
def program(dataset: String): IO[Unit] =
preparePartitioningDatasetIO(dataset, "sdp_id").flatMap { partitions =>
partitions.parTraverse { partition =>
val whereStatement = s"SDP_ID = '$partition'"
getFullDatasetResultIterableIO(
dataset = dataset,
format = format._1,
limit = none,
where = whereStatement.some
)
} flatMap { datasets =>
datasets.traverse_ { dataset =>
notifyPartitionedDatasetIO(
bearerToken = bearerToken,
endpoint = endpoint,
dataset = dataset
)
}
}
}
Although this would imply that the whole data is kept in memory before starting to notify.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.