PySpark-在数据框中将默认列名作为“值”

Question

So I have a dataframe, df2 ,which looks like:所以我有一个数据框 df2 ，它看起来像：

I had to convert the values to python float type because of errors-由于错误，我不得不将值转换为 python 浮点类型-

df2 = spark.createDataFrame([float(x) for x in data],FloatType())

Now maybe due to this I'm getting the default column name as "value" whereas I want the column name to be "Result".现在可能由于这个原因，我将默认列名作为“值”，而我希望列名是“结果”。 I tried renaming the column using the withColumnRenamed() method but it's not working, it's showing the same output.我尝试使用 withColumnRenamed() 方法重命名列，但它不起作用，它显示相同的输出。 Any idea how I can change the default column name?知道如何更改默认列名吗？

Answer 1

I think you do withColumnRenamed() but don't assign it to df2 :我认为您这样做withColumnRenamed()但不要将其分配给df2 ：

df2 = df2.withColumnRenamed("value", "Result")

Or during dataframe creation you could pass the name of the column you want:或者在数据框创建期间，您可以传递所需列的名称：

from pyspark.sql.types import *
schema = StructType([StructField("Result", FloatType(), True)])
df2 = spark.createDataFrame([float(x) for x in data], schema)

Answer 2

u can try this:你可以试试这个：

d1= [(0.0,), (0.0,), (0.0,), (5.0,), (57.0,), (142.0,)]
df1 = spark.createDataFrame(d1, 'value float')
df1.printSchema()

# root
#  |-- value: float (nullable = true)

df1.show()
# +-----+
# |value|
# +-----+
# |  0.0|
# |  0.0|
# |  0.0|
# |  5.0|
# | 57.0|
# |142.0|
# +-----+

PySpark-在数据框中将默认列名作为“值”

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-07-21 10:09:42

解决方案2
0 2022-07-21 10:02:04

PySpark-在数据框中将默认列名作为“值”

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-07-21 10:09:42

解决方案2 0 2022-07-21 10:02:04

解决方案1
1 已采纳 2022-07-21 10:09:42

解决方案2
0 2022-07-21 10:02:04