在 Pyspark 中使用 Stringindexer 时如何将列名作为变量

Question

{simpleDF.columns 
 #output :['color', 'lab', 'value1', 'value2']
 indexer = simpleDF.select('lab')

 from pyspark.ml.feature import StringIndexer
 # Let us create an object of the class StringIndexer
 lblindexer=StringIndexer().setInputCol(indexer).setOutputCol("LabelIndexed")
 idxRes=lblindexer.fit(simpleDF).transform(simpleDF)

 idxRes.show(5)}

这行代码运行良好，但我希望它更通用

 #lblindexer=StringIndexer().setInputCol('lab').setOutputCol("LabelIndexed")

收到错误：TypeError：为参数“inputCol”提供的参数值无效。 无法将 <class 'pyspark.sql.dataframe.DataFrame'> 转换为字符串类型

Answer 1

使用输入 col 的列名，而不是 dataframe：

lblindexer=StringIndexer().setInputCol('lab').setOutputCol("LabelIndexed")

如果要使用变量，

indexer = 'lab'
lblindexer=StringIndexer().setInputCol(indexer).setOutputCol("LabelIndexed")

在 Pyspark 中使用 Stringindexer 时如何将列名作为变量

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-02-06 12:38:21

在 Pyspark 中使用 Stringindexer 时如何将列名作为变量

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-02-06 12:38:21

解决方案1
0 已采纳 2021-02-06 12:38:21