I used pip install pyspark
to install PySpark. I didn't set any path etc.; however, I found that everything was downloaded and copied into C:/Users/Admin/anaconda3/scripts
. I opened jupyter notebook in a Python3 kernel and I tried to run a SystemML script but it was giving me an error. I realized that I needed to place winutils.exe in C:/Users/Admin/anaconda3/scripts
as well, so I did that and the script ran as expected.
Now, my program includes GridSearch and when I run it on my personal laptop, it is markedly slower than how it is on a Cloud data platform where I can initiate a kernel with Spark (such as IBM Watson Studio).
So my questions are:
(i) How do I add PySpark to the Python3 kernel? Or is it already working in the background when I import pyspark
?
(ii) When I run the same code on the same dataset using pandas and scikit-learn, there is not much difference in performance. When is PySpark preferred/beneficial over pandas and scikit-learn?
Another thing is, even though PySpark seems to be working fine and I'm able to import its libraries, when I try to run
import findspark
findspark.init()
it throws up and error (on line 2), saying the list is out of range
. I googled a bit and found an advice that said that I had to explicitly set SPARK_HOME='C:/Users/Admin/anaconda3/Scripts'
; but when I do that, pyspark stops working (findspark.init() still not working).
If anyone can explain what is going on, I'd be very grateful. Thank you.
How do I add PySpark to the Python3 kernel
pip install
, like you've said you have done
there is not much difference in performance
You're only using one machine, so there wouldn't be
When is PySpark preferred/beneficial over pandas and scikit-learn?
When you want to deploy the same code onto an actual Spark cluster and your dataset is stored in distributed storage
You don't necessarily need findspark
if your environment variables are already setup
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.