简体   繁体   English

Pyspark:如何创建只有一行的数据框?

[英]Pyspark: how to create a dataframe with only one row?

What I am trying to do seems to be quite simple.我想要做的似乎很简单。 I need to create a dataframe with a single column and a single value.我需要创建一个具有单列和单个值的数据框。

I have tried a few approaches, namely:我尝试了几种方法,即:

Creation of empty dataframe and appending the data afterwards:创建空数据框并在之后附加数据:

project_id = 'PC0000000042'
schema = T.StructType([T.StructField("ProjectId", T.StringType(), True)])
empty_df = sqlContext.createDataFrame(sc.emptyRDD(), schema)

rdd = sc.parallelize([(project_id)])
df_temp = spark.createDataFrame(rdd, SCHEMA)
df = empty_df.union(df_temp)

Creation of dataframe based on this one value.基于这个值创建数据框。

rdd = sc.parallelize([(project_id)])
df = spark.createDataFrame(rdd, schema)

However, what I get in both cases is:但是,我在这两种情况下得到的是:

TypeError: StructType can not accept object 'PC0000000042' in type <class 'str'>

Which I don't quite understand since the type seems to be correct.我不太明白,因为类型似乎是正确的。 Thank you for any advice!感谢您的任何建议!

One small change.一个小小的改变。 If you have project_id = 'PC0000000042' , then如果你有project_id = 'PC0000000042' ,那么
rdd = sc.parallelize([[project_id]])

You should pass the data as a list of list: [['PC0000000042']] instead of ['PC0000000042'] .您应该将数据作为列表列表传递: [['PC0000000042']]而不是['PC0000000042']

If you have 2 rows, then:如果您有 2 行,则:

project_id = [['PC0000000042'], ['PC0000000043']]
rdd = sc.parallelize(project_id)
spark.createDataFrame(rdd, schema).show()

+------------+
|   ProjectId|
+------------+
|PC0000000042|
|PC0000000043|
+------------+

Without RDDs , you can also do:没有RDDs ,您还可以执行以下操作:

project_id = [['PC0000000042']]
spark.createDataFrame(project_id,schema=schema).show()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 How to create a function that checks if one row in a PySpark column of a dataframe matches another row in the same column of another dataframe? - How to create a function that checks if one row in a PySpark column of a dataframe matches another row in the same column of another dataframe? 如何将pyspark数据帧一行中的字节数组转换为一列字节? - how to convert a bytearray in one row of a pyspark dataframe to a column of bytes? 创建一个新列,详细说明一个 PySpark dataframe 中的行是否与 dataframe 的另一列中的行匹配 - Create a new column that details if rows in one PySpark dataframe matches a a row in another column of a dataframe 如何使用pyspark仅对spark数据框中的一列进行排序? - How to sort only one column within a spark dataframe using pyspark? "如何在 pyspark 中创建数据框的副本?" - How to create a copy of a dataframe in pyspark? Pandas - 使用整数数组从字典中只创建一行数据帧 - Pandas - Create dataframe with only one row from dictionary with array of integers 如何创建具有单个标头(1行多列)的数据框,并在pyspark中将此值更新值? - How to create dataframe with single header ( 1 row many cols) and update values to this dataframe in pyspark? Pyspark:如何使用其他数据框创建数据框 - Pyspark: how to create a dataframe using other dataframe 如何将 Pyspark 数据帧标题设置为另一行? - How to Set Pyspark Dataframe Headers to another Row? 如何获取row_number是pyspark数据帧 - How to get row_number is pyspark dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM