[英]Spark + Python - how to set the system environment variables?
I'm on spark-1.4.1. 我正在使用spark-1.4.1。 How can I set the system environment variables for Python?
如何为Python设置系统环境变量?
For instance, in R, 例如,在R中
Sys.setenv(SPARK_HOME = "C:/Apache/spark-1.4.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
What about in Python? 在Python中呢?
import os
import sys
from pyspark.sql import SQLContext
sc = SparkContext(appName="PythonSQL")
sqlContext = SQLContext(sc)
# Set the system environment variables.
# ref: https://github.com/apache/spark/blob/master/examples/src/main/python/sql.py
if len(sys.argv) < 2:
path = "file://" + \
os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
else:
path = sys.argv[1]
# Create the DataFrame
df = sqlContext.jsonFile(path)
# Show the content of the DataFrame
df.show()
I get this error, 我得到这个错误,
df is not defined. df未定义。
Any ideas? 有任何想法吗?
Just try it like this: https://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes 像这样尝试一下: https : //spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
By providing path = "examples/src/main/resources/people.json"
as parameter to df = sqlContext.jsonFile(path)
通过提供
path = "examples/src/main/resources/people.json"
作为df = sqlContext.jsonFile(path)
If you don't provide arguments when you run your python script, then it will go into if len(sys.argv) < 2:
, this requires you to have defined SPARK_HOME
as a system variable. 如果您在运行python脚本时未提供参数,则
if len(sys.argv) < 2:
它将进入其中,这要求您将SPARK_HOME
定义为系统变量。 If not, it won't find your specified .json file. 如果没有,它将找不到您指定的.json文件。 Which seems to be your problem.
这似乎是您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.