如何将整个列表内容添加到Pyspark数据框行中？

Question

I am creating a new pyspark dataframe from a list of strings. 我正在从字符串列表中创建一个新的pyspark数据框。 How should my code look like? 我的代码应如何显示？

This is my list: ['there', 'is', 'one', 'that', 'commands'] and this is what I want ideally: 这是我的列表：['there'，'is'，'one'，'that'，'commands]，这是我理想中想要的：

words(header) 字（标题）

Row 1: ['there', 'is', 'one', 'that', 'commands'] Row 2: ['test', 'try' 第1行：['there'，'is'，'one'，'that'，'commands]第2行：['test'，'try'

I have tried out the following codes but none of them gave me exactly what I wanted. 我已经尝试了以下代码，但是没有一个能给我我想要的。

test_list=['hi','bye','thanks']
test_list=sc.parallelize(test_list)

schema = StructType([StructField("name", StringType(), True)])
df3 = sqlContext.createDataFrame(test_list, schema)

AND 和

test_list=['hi','bye','thanks']
test_list=sc.parallelize(test_list)
df3 = sqlContext.createDataFrame(row(test_list), schema)

I cannot get the dataframes to show using df.show(). 我无法使用df.show（）显示数据框。

Answer 1

You just need to import Row object, rest everything was fine. 您只需要导入Row对象，就可以了。

from pyspark.sql.types import Row, StructType, StructField, StringType
test_list=['hi','bye','thanks']
test_list=sc.parallelize(test_list)

rdd= test_list.map(lambda t: Row(name=t))
schema = StructType([StructField("name", StringType(), True)])
df = sqlContext.createDataFrame(rdd, schema)
df.show()
+------+
|  name|
+------+
|    hi|
|   bye|
|thanks|
+------+

如何将整个列表内容添加到Pyspark数据框行中？

问题描述

words(header) 字（标题）

1 个解决方案

解决方案1
-1 已采纳 2019-02-18 12:39:47

如何将整个列表内容添加到Pyspark数据框行中？

问题描述

words(header) 字（标题）

1 个解决方案

解决方案1 -1 已采纳 2019-02-18 12:39:47

解决方案1
-1 已采纳 2019-02-18 12:39:47