简体   繁体   中英

Pyspark rdd to dataframe conversion

I would like to create a tempview spark with the following code but the issue is an error as below. I read some threads with similar problem which solved the problem by adding a spark session but it did not solve the issue for me. Any help would be appreciated.

The error:

    df = rdd.toDf(name_col1, name_col2)
AttributeError: 'RDD' object has no attribute 'toDf'

The code:

conf = SparkConf().setAppName('rates').setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = sql.SQLContext(sc)
spark = SparkSession(sc)

input1 = ('BTC','USD')
rdd = sc.parallelize([input1])
name_col1 = "fsym"
name_col2 = "tsyms"

df = rdd.toDf(name_col1, name_col2)
json_df = df.createTempView('payload')

And the answer is:

df = spark.createDataFrame(data=rdd, schema=[name_col1, name_col2])

Alterntatively you can use toDF (note the capital F):

df = rdd.toDF([name_col1, name_col2])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM