In Spark, how do I use groupBy with spark-submit?

Question

I have a spark python script that has a groupBy in it. In particular, the structure is

import operator
result = sc.textFile(...).map(...).groupBy(...).map(...).reduce(operator.add)

When I run this in an ipython pyspark shell, it works just fine. However, when I try to script it and run it through spark-submit, I get a pickle.PicklingError: Can't pickle builtin <type 'method_descriptor'> error citing the groupBy as the concern. Is there a known workaround for this?

Answer 1

It turns out there's a lot that pickle can't do, including lambdas. I was doing some of that and needed to be more careful.

In Spark, how do I use groupBy with spark-submit?

Question

1 answers

solution1
0 2014-11-04 01:30:35

In Spark, how do I use groupBy with spark-submit?

Question

1 answers

solution1 0 2014-11-04 01:30:35

solution1
0 2014-11-04 01:30:35