简体   繁体   中英

How to add an entire list contents into a Pyspark Dataframe row?

I am creating a new pyspark dataframe from a list of strings. How should my code look like?

This is my list: ['there', 'is', 'one', 'that', 'commands'] and this is what I want ideally:

words(header)

Row 1: ['there', 'is', 'one', 'that', 'commands'] Row 2: ['test', 'try'

I have tried out the following codes but none of them gave me exactly what I wanted.

test_list=['hi','bye','thanks']
test_list=sc.parallelize(test_list)

schema = StructType([StructField("name", StringType(), True)])
df3 = sqlContext.createDataFrame(test_list, schema)

AND

test_list=['hi','bye','thanks']
test_list=sc.parallelize(test_list)
df3 = sqlContext.createDataFrame(row(test_list), schema)

I cannot get the dataframes to show using df.show().

You just need to import Row object, rest everything was fine.

from pyspark.sql.types import Row, StructType, StructField, StringType
test_list=['hi','bye','thanks']
test_list=sc.parallelize(test_list)

rdd= test_list.map(lambda t: Row(name=t))
schema = StructType([StructField("name", StringType(), True)])
df = sqlContext.createDataFrame(rdd, schema)
df.show()
+------+
|  name|
+------+
|    hi|
|   bye|
|thanks|
+------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM