简体   繁体   中英

In Spark, how do I use groupBy with spark-submit?

I have a spark python script that has a groupBy in it. In particular, the structure is

import operator
result = sc.textFile(...).map(...).groupBy(...).map(...).reduce(operator.add)

When I run this in an ipython pyspark shell, it works just fine. However, when I try to script it and run it through spark-submit, I get a pickle.PicklingError: Can't pickle builtin <type 'method_descriptor'> error citing the groupBy as the concern. Is there a known workaround for this?

It turns out there's a lot that pickle can't do, including lambdas. I was doing some of that and needed to be more careful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM