I have pipeline which looks like class with some methods. In each method I process some data. Example:
class Pipeline:
def load_users(self):
pass
def load_sessions(self):
pass
Should I initialize new spark session in every method with custom config? Or better to initialize its once in __init__
method?
You can live with doing this once up front and changing Spark properties as you go through your various Actions / Pipelines, using spark.conf.set("prop", 'val'). That is how most do and it there are few examples to be found to the contrary.
If you want better insight, then from the master himself: How many SparkSessions can a single application have? . This adds some insights which one could consider in relation to your question. Question is if you really need to consider this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.