[英]Treat Spark RDD like plain Seq
I have a CLI application for transforming JSONs. 我有一个用于转换JSON的CLI应用程序。 Most of it's code is
map
ping, flatMap
ping and traversing with for
Lists of JValues. 大部分的代码是
map
平, flatMap
平与穿越for
JValues名单。 Now I want to port this application to Spark, but seems I need to rewrite all functions 1:1, but write RDD[JValue]
instead of List[JValue]
. 现在我想将此应用程序移植到Spark,但似乎我需要重写所有函数1:1,但是编写
RDD[JValue]
而不是List[JValue]
。
Is there any way (like type class) for function to accept both Lists and RDDs. 函数是否有任何方式(如类型类)接受列表和RDD。
If you want to share your code for processing local & abstract code you can move your lambdas/anaonymous functions that you pass in to map
/ flatMap
into named functions and re-use them. 如果要共享用于处理本地和抽象代码的代码,可以将传入的lambdas / anaonymous函数移动到
map
/ flatMap
并重新使用它们。
If you want to re-use your logic for how to order the maps/flatMaps/etc, you could also create an implicit conversions between both RDD
and Seq
to a custom trait which has only the shared functions but implicit conversions can become quite confusing and I don't really think this is a good idea (but you could do it if you disagree with me :)). 如果您想重新使用逻辑来定义maps / flatMaps / etc,您还可以在
RDD
和Seq
之间创建一个隐式转换到自定义特征,该特征只有共享函数,但隐式转换可能会变得非常混乱,我不认为这是一个好主意(但如果你不同意我,你可以这样做:))。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.