简体   繁体   English

像普通Seq一样对待Spark RDD

[英]Treat Spark RDD like plain Seq

I have a CLI application for transforming JSONs. 我有一个用于转换JSON的CLI应用程序。 Most of it's code is map ping, flatMap ping and traversing with for Lists of JValues. 大部分的代码是map平, flatMap平与穿越for JValues名单。 Now I want to port this application to Spark, but seems I need to rewrite all functions 1:1, but write RDD[JValue] instead of List[JValue] . 现在我想将此应用程序移植到Spark,但似乎我需要重写所有函数1:1,但是编写RDD[JValue]而不是List[JValue]

Is there any way (like type class) for function to accept both Lists and RDDs. 函数是否有任何方式(如类型类)接受列表和RDD。

If you want to share your code for processing local & abstract code you can move your lambdas/anaonymous functions that you pass in to map / flatMap into named functions and re-use them. 如果要共享用于处理本地和抽象代码的代码,可以将传入的lambdas / anaonymous函数移动到map / flatMap并重新使用它们。

If you want to re-use your logic for how to order the maps/flatMaps/etc, you could also create an implicit conversions between both RDD and Seq to a custom trait which has only the shared functions but implicit conversions can become quite confusing and I don't really think this is a good idea (but you could do it if you disagree with me :)). 如果您想重新使用逻辑来定义maps / flatMaps / etc,您还可以在RDDSeq之间创建一个隐式转换到自定义特征,该特征只有共享函数,但隐式转换可能会变得非常混乱,我不认为这是一个好主意(但如果你不同意我,你可以这样做:))。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM