如何实现Functor [Dataset]

Question

I am struggling on how to create an instance of Functor[Dataset] ... the problem is that when you map from A to B the Encoder[B] must be in the implicit scope but I am not sure how to do it. 我正在努力如何创建Functor[Dataset]的实例...问题是当你从A map到B ， Encoder[B]必须在隐式范围内，但我不知道该怎么做。

implicit val datasetFunctor: Functor[Dataset] = new Functor[Dataset] {
    override def map[A, B](fa: Dataset[A])(f: A => B): Dataset[B] = fa.map(f)
  }

Of course this code is throwing a compilation error since Encoder[B] is not available but I can't add Encoder[B] as an implicit parameter because it would change the map method signature, how can I solve this? 当然这个代码抛出了一个编译错误，因为Encoder[B]不可用但我不能将Encoder[B]添加为隐式参数，因为它会改变map方法签名，我该如何解决这个问题？

Answer 1

You cannot apply f right away, because you are missing the Encoder . 你不能马上申请f ，因为你错过了Encoder 。 The only obvious direct solution would be: take cats and re-implement all the interfaces, adding an implict Encoder argument. 唯一明显的直接解决方案是：带cats并重新实现所有接口，添加一个隐含的Encoder参数。 I don't see any way to implement a Functor for Dataset directly . 我看不出有任何的方式来实现Functor的Dataset 直接。

However maybe the following substitute solution is good enough. 然而，以下替代解决方案可能足够好。 What you could do is to create a wrapper for the dataset, which has a map method without the implicit Encoder , but additionally has a method toDataset , which needs the Encoder in the very end. 你可以做的是为数据集创建一个包装器，它有一个没有隐式Encoder的map方法，但是还有一个toDataset方法，最后需要Encoder 。

For this wrapper, you could apply a construction which is very similar to the so-called Coyoneda -construction (or Coyo ? What do they call it today? I don't know...). 对于这个包装器，你可以应用一个非常类似于所谓的Coyoneda （或Coyo ？今天他们称之为什么？我不知道......）的结构。 It essentially is a way to implement a "free functor" for an arbitrary type constructor. 它本质上是一种为任意类型构造函数实现“自由函子”的方法。

Here is a sketch (it compiles with cats 1.0.1, replaced Spark traits by dummies): 这是一个草图（它与猫1.0.1编译，由假人取代了Spark特征）：

import scala.language.higherKinds
import cats.Functor

/** Dummy for spark-Encoder */
trait Encoder[X]

/** Dummy for spark-Dataset */
trait Dataset[X] {
  def map[Y](f: X => Y)(implicit enc: Encoder[Y]): Dataset[Y]
}

/** Coyoneda-esque wrapper for `Dataset` 
  * that simply stashes all arguments to `map` away
  * until a concrete `Encoder` is supplied during the
  * application of `toDataset`.
  *
  * Essentially: the wrapped original dataset + concatenated
  * list of functions which have been passed to `map`.
  */
abstract class MappedDataset[X] private () { self =>
  type B
  val base: Dataset[B]
  val path: B => X
  def toDataset(implicit enc: Encoder[X]): Dataset[X] = base map path

  def map[Y](f: X => Y): MappedDataset[Y] = new MappedDataset[Y] {
    type B = self.B
    val base = self.base
    val path: B => Y = f compose self.path
  }
}

object MappedDataset {
  /** Constructor for MappedDatasets.
    * 
    * Wraps a `Dataset` into a `MappedDataset` 
    */
  def apply[X](ds: Dataset[X]): MappedDataset[X] = new MappedDataset[X] {
    type B = X
    val base = ds
    val path = identity
  }

}        

object MappedDatasetFunctor extends Functor[MappedDataset] {
  /** Functorial `map` */
  def map[A, B](da: MappedDataset[A])(f: A => B): MappedDataset[B] = da map f
}

Now you can wrap a dataset ds into a MappedDataset(ds) , then map it using the implicit MappedDatasetFunctor as long as you want, and then call toDataset in the very end, there you can supply a concrete Encoder for the final result. 现在，您可以将数据集ds包装到MappedDataset(ds) ，然后根据需要使用隐式MappedDatasetFunctor对其进行map ，然后在最后调用toDataset ，您可以为最终结果提供具体的Encoder 。

Note that this will combine all functions inside map into a single spark stage: it won't be able to save the intermediate results, because the Encoder s for all intermediate steps are missing. 请注意，这会将map所有函数组合到一个spark阶段：它将无法保存中间结果，因为缺少所有中间步骤的Encoder 。

I'm not quite there yet with studying cats , I cannot guarantee that this is the most idiomatic solution. 我还没有学过cats ，我无法保证这是最惯用的解决方案。 Probably there is something Coyoneda -esque already in the library. 可能Coyoneda -esque已经在图书馆里了。

EDIT: There is Coyoneda in the cats library, but it requires a natural transformation F ~> G to a functor G . 编辑：在猫库中有Coyoneda ，但它需要将F ~> G自然转换为仿函数G Unfortunately, we don't have a Functor for Dataset (that was the problem in the first place). 不幸的是，我们没有Dataset的Functor （首先是问题）。 What my implementation above does is: instead of a Functor[G] , it requires a single morphism of the (non-existent) natural transformation at a fixed X (this is what the Encoder[X] is). 我上面的实现是：代替Functor[G] ，它需要在固定的X处的（不存在的）自然变换的单个态射 （这是Encoder[X]所用的）。

如何实现Functor [Dataset]

问题描述

1 个解决方案

解决方案1
7 已采纳 2018-02-10 23:54:22

如何实现Functor [Dataset]

问题描述

1 个解决方案

解决方案1 7 已采纳 2018-02-10 23:54:22

解决方案1
7 已采纳 2018-02-10 23:54:22