简体   繁体   English

Spark + Python-如何设置系统环境变量?

[英]Spark + Python - how to set the system environment variables?

I'm on spark-1.4.1. 我正在使用spark-1.4.1。 How can I set the system environment variables for Python? 如何为Python设置系统环境变量?

For instance, in R, 例如,在R中

Sys.setenv(SPARK_HOME = "C:/Apache/spark-1.4.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

What about in Python? 在Python中呢?

import os
import sys

from pyspark.sql import SQLContext

sc = SparkContext(appName="PythonSQL")
sqlContext = SQLContext(sc)

# Set the system environment variables.
# ref: https://github.com/apache/spark/blob/master/examples/src/main/python/sql.py
if len(sys.argv) < 2:
    path = "file://" + \
        os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
else:
    path = sys.argv[1]

# Create the DataFrame
df = sqlContext.jsonFile(path)

# Show the content of the DataFrame
df.show()

I get this error, 我得到这个错误,

df is not defined. df未定义。

在此处输入图片说明

Any ideas? 有任何想法吗?

Just try it like this: https://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes 像这样尝试一下: https : //spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes

By providing path = "examples/src/main/resources/people.json" as parameter to df = sqlContext.jsonFile(path) 通过提供path = "examples/src/main/resources/people.json"作为df = sqlContext.jsonFile(path)

If you don't provide arguments when you run your python script, then it will go into if len(sys.argv) < 2: , this requires you to have defined SPARK_HOME as a system variable. 如果您在运行python脚本时未提供参数,则if len(sys.argv) < 2:它将进入其中,这要求您将SPARK_HOME定义为系统变量。 If not, it won't find your specified .json file. 如果没有,它将找不到您指定的.json文件。 Which seems to be your problem. 这似乎是您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM