简体   繁体   中英

Is Spark good for automatically running statistical analysis script in many nodes for a speedup?

I have a Python script that runs statistical analysis and trained deep learning models on input data. The data size is fairly small (~5Mb) however the speed is slow due to the complexity of the analysis script. I wonder if it would be possible to use Spark to run my script in different nodes on a cluster so that I can gain a speedup. Basically, I want to divide the input data into many subsets and run the analysis script in parallel. Is Spark a good tool for this purpose? Thank you in advance!

As long as you integrate your deep learning model into your pyspark pipeline and use partitioning , you can expect a speedup in the runtime. Without code, it's hard to make specific recommendations, but this article is a good place to start.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM