简体   繁体   English

SparkSession初始化错误 - 无法使用spark.read

[英]SparkSession initialization error - Unable to use spark.read

I tried to create a standalone PySpark program that reads a csv and stores it in a hive table. 我尝试创建一个独立的PySpark程序,它读取csv并将其存储在hive表中。 I have trouble configuring Spark session, conference and contexts objects. 我在配置Spark会话,会议和上下文对象时遇到问题。 Here is my code: 这是我的代码:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.types import *

conf = SparkConf().setAppName("test_import")
sc = SparkContext(conf=conf)
sqlContext  = SQLContext(sc)

spark = SparkSession.builder.config(conf=conf)
dfRaw = spark.read.csv("hdfs:/user/..../test.csv",header=False)

dfRaw.createOrReplaceTempView('tempTable')
sqlContext.sql("create table customer.temp as select * from tempTable")

And I get the error: 我收到错误:

dfRaw = spark.read.csv("hdfs:/user/../test.csv",header=False) AttributeError: 'Builder' object has no attribute 'read' dfRaw = spark.read.csv(“hdfs:/ user /../ test.csv”,header = False)AttributeError:'Builder'对象没有属性'read'

Which is the right way to configure spark session object in order to use read.csv command? 为了使用read.csv命令,哪种配置spark会话对象的正确方法? Also, can someone explain the diference between Session, Context and Conference objects? 另外,有人可以解释Session,Context和Conference对象之间的差异吗?

There is no need to use both SparkContext and SparkSession to initialize Spark. 无需使用SparkContextSparkSession来初始化Spark。 SparkSession is the newer, recommended way to use. SparkSession是较新的推荐使用方式。

To initialize your environment, simply do: 要初始化您的环境,只需执行以下操作:

spark = SparkSession\
  .builder\
  .appName("test_import")\
  .getOrCreate()

You can run SQL commands by doing: 您可以执行以下操作来运行SQL命令:

spark.sql(...)

Prior to Spark 2.0.0, three separate objects were used: SparkContext , SQLContext and HiveContext . 在Spark 2.0.0之前,使用了三个独立的对象: SparkContextSQLContextHiveContext These were used separatly depending on what you wanted to do and the data types used. 这些是分开使用的,具体取决于您想要做什么以及使用的数据类型。

With the intruduction of the Dataset/DataFrame abstractions, the SparkSession object became the main entry point to the Spark environment. 随着数据集/数据帧抽象的SparkSessionSparkSession对象成为Spark环境的主要入口点。 It's still possible to access the other objects by first initialize a SparkSession (say in a variable named spark ) and then do spark.sparkContext / spark.sqlContext . 通过首先初始化SparkSession (例如在名为spark的变量中)然后执行spark.sparkContext / spark.sqlContext仍然可以访问其他对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Spark spark.read 无法按预期工作 - Apache Spark spark.read not working as intended Spark.Read 使用 JSON 的动态选项 - Spark.Read dynamic options using JSON spark spark.read().load().select().filter() vs spark.read().option(query) 时间差大 - spark spark.read().load().select().filter() vs spark.read().option(query) BIG time diference 如何用 spark.read 方法排除第一行? - How to exclude the first line with spark.read method? Spark.read() 一次多条路径,而不是在 for 循环中一个接一个 - Spark.read() multiple paths at once instead of one-by-one in a for loop Pyspark 中 SparkSession 的导入错误 - Import Error for SparkSession in Pyspark 如何在 dataframe 中使用 sparkSession 使用 spark-cassandra-connector 在 pyspark 中写入 - how to use sparkSession in dataframe write in pyspark using spark-cassandra-connector Spark 3.0.0 创建 SparkSession 时出错:pyspark.sql.utils.IllegalArgumentException:<exception str() failed></exception> - Spark 3.0.0 error creating SparkSession: pyspark.sql.utils.IllegalArgumentException: <exception str() failed> 如何使用 pyspark 在 Spark 2.0 中构建 sparkSession? - How to build a sparkSession in Spark 2.0 using pyspark? 在 PySpark 中创建 SparkSession 时出错 - Error when creating SparkSession in PySpark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM