简体   繁体   English

给定一个Spark 2.0.0示例,如何在Spark 1.6.2中创建一个空的数据框?

[英]How to create an empty dataframe in Spark 1.6.2 given an example of Spark 2.0.0?

Is there the way to substitute this line of code to be able to execute it with PySpark version 1.6.2, not 2.0.0? 有没有办法替代这一行代码以便能够使用PySpark 1.6.2版而不是2.0.0版执行? The problem is that SparkSession does not exist in Spark 1.6.2. 问题在于, SparkSession中不存在SparkSession

cfg = SparkConf().setAppName('s')
spark = SparkSession.builder.enableHiveSupport().config(conf=cfg).getOrCreate()
df = spark.createDataFrame([], schema=StructType([StructField('id', StringType()),
                                                         StructField('pk', StringType()),
                                                         StructField('le', StringType()),
                                                         StructField('or', StringType())]))

For older versions of Spark (earlier versions than 2.0), you can use HiveContext instead of SparkSession , see the relevant documentation . 对于旧版本的Spark(2.0之前的版本),可以使用HiveContext代替SparkSession ,请参阅相关文档 Small example of setting up the environment: 设置环境的小示例:

from pyspark import HiveContext

conf = SparkConf().setAppName('s')
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

After this you can create a dataframe in the same way as before by using the sqlContext variable. 之后,您可以使用sqlContext变量,以与以前相同的方式创建一个数据sqlContext

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM