[英]Set findspark.init() Permanently
I have Apache Spark installed on ubuntu at this path /home/mymachine/spark-2.1.0-bin-hadoop2.7
so I have to go to python directory, located under this directory, to be able using spark OR I can use it outside python directory with help from a library called findspark, however it seems I have to always init this library like this: 我在ubuntu的
/home/mymachine/spark-2.1.0-bin-hadoop2.7
路径上安装了Apache Spark,因此我必须转到该目录下的python目录,才能使用spark或我可以使用它在名为findspark的库的帮助下在python目录外部,但是似乎我必须始终像这样初始化该库:
import findspark
findspark.init("/home/mymachine/spark-2.1.0-bin-hadoop2.7")
everytime I want to use findspark
, which is not very effective. 每次我想使用
findspark
,效果不是很好。 Is there anyway to init this library permanently? 无论如何,有没有永久初始化该库的方法?
At here it mentioned need to set a variable SPARK_HOME
on .bash_profile and I did it, but no luck. 在这里,它提到需要在.bash_profile上设置变量
SPARK_HOME
,而我做到了,但是没有运气。
Add the following variables to your .bashrc file 将以下变量添加到您的.bashrc文件中
export SPARK_HOME=/path/2/spark/folder
export PATH=$SPARK_HOME/bin:$PATH
then source .bashrc
然后
source .bashrc
If you wish run to pyspark with jupyter notebook, add these variables to .bashrc 如果您希望使用jupyter notebook运行pyspark,请将这些变量添加到.bashrc中
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
again source .bashrc
再次
source .bashrc
Now if you run pyspark
from shell, it will launch jupyter notebook server and pyspark will be availble on python kernels. 现在,如果您从shell运行
pyspark
,它将启动jupyter笔记本服务器,并且pyspark将在python内核上可用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.