简体   繁体   English

永久设置findspark.init()

[英]Set findspark.init() Permanently

I have Apache Spark installed on ubuntu at this path /home/mymachine/spark-2.1.0-bin-hadoop2.7 so I have to go to python directory, located under this directory, to be able using spark OR I can use it outside python directory with help from a library called findspark, however it seems I have to always init this library like this: 我在ubuntu的/home/mymachine/spark-2.1.0-bin-hadoop2.7路径上安装了Apache Spark,因此我必须转到该目录下的python目录,才能使用spark或我可以使用它在名为findspark的库的帮助下在python目录外部,但是似乎我必须始终像这样初始化该库:

import findspark
findspark.init("/home/mymachine/spark-2.1.0-bin-hadoop2.7")

everytime I want to use findspark , which is not very effective. 每次我想使用findspark ,效果不是很好。 Is there anyway to init this library permanently? 无论如何,有没有永久初始化该库的方法?

At here it mentioned need to set a variable SPARK_HOME on .bash_profile and I did it, but no luck. 这里,它提到需要在.bash_profile上设置变量SPARK_HOME ,而我做到了,但是没有运气。

Add the following variables to your .bashrc file 将以下变量添加到您的.bashrc文件中

export SPARK_HOME=/path/2/spark/folder
export PATH=$SPARK_HOME/bin:$PATH

then source .bashrc 然后source .bashrc
If you wish run to pyspark with jupyter notebook, add these variables to .bashrc 如果您希望使用jupyter notebook运行pyspark,请将这些变量添加到.bashrc中

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

again source .bashrc 再次source .bashrc
Now if you run pyspark from shell, it will launch jupyter notebook server and pyspark will be availble on python kernels. 现在,如果您从shell运行pyspark ,它将启动jupyter笔记本服务器,并且pyspark将在python内核上可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM