简体   繁体   English

如何使用 Scala 在 DataFrame 中添加新的可为空字符串列

[英]How to add a new nullable String column in a DataFrame using Scala

There are probably at least 10 question very similar to this, but I still have not found a clear answer.可能至少有10个问题与此非常相似,但我仍然没有找到明确的答案。

How can I add a nullable string column to a DataFrame using scala?如何使用 scala 将可为空的字符串列添加到 DataFrame? I was able to add a column with null values, but the DataType shows null我能够添加具有 null 值的列,但 DataType 显示 null

val testDF = myDF.withColumn("newcolumn", when(col("UID") =!= "not", null).otherwise(null))

However, the schema shows但是,架构显示

root
 |-- UID: string (nullable = true)
 |-- IsPartnerInd: string (nullable = true)
 |-- newcolumn: null (nullable = true)

I want the new column to be string |-- newcolumn: string (nullable = true)我希望新列是字符串|-- newcolumn: string (nullable = true)

Please don't mark as duplicate, unless it's really the same question and in scala.请不要标记为重复,除非它确实是同一个问题并且在 scala 中。

Just explicitly cast null literal to StringType .只需将 null 文字显式转换为StringType即可。

scala> val testDF = myDF.withColumn("newcolumn", when(col("UID") =!= "not", lit(null).cast(StringType)).otherwise(lit(null).cast(StringType)))

scala> testDF.printSchema

root
 |-- UID: string (nullable = true)
 |-- newcolumn: string (nullable = true)

Why do you want a column which is always null?为什么你想要一个总是 null 的列? There are several ways, I would prefer the solution with typedLit :有几种方法,我更喜欢typedLit的解决方案:

myDF.withColumn("newcolumn", typedLit[String](null))

or for older Spark versions:或者对于旧的 Spark 版本:

myDF.withColumn("newcolumn",lit(null).cast(StringType))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 关于如何在 Scala 中使用随机值向现有 DataFrame 添加新列 - About how to add a new column to an existing DataFrame with random values in Scala 如何使用Scala / Spark 2.2将列添加到现有DataFrame并使用window函数在新列中添加特定行 - How to add a column to the existing DataFrame and using window function to add specific rows in the new column using Scala/Spark 2.2 Scala通过表达式向数据框添加新列 - Scala add new column to dataframe by expression 如何在Azure Databricks上使用Scala循环将新列添加到数据框 - How to add new columns to a dataframe in a loop using scala on Azure Databricks 如何向我的 DataFrame 添加新列,以便新列的值由 scala 中的其他一些 function 填充? - How to add a new column to my DataFrame such that values of new column are populated by some other function in scala? 如何使用 scala 中的 withColumn function 添加可变列表作为 dataframe 的列 - How to add a mutable list as a column of a dataframe using withColumn function in scala 如何使用scala从数据框中获取字符串列的最大长度? - How to get max length of string column from dataframe using scala? 使用 scala 在 Spark DataFrame 中添加新行 - Add new rows in the Spark DataFrame using scala 将具有文字值的新列添加到 Spark Scala 中 Dataframe 中的结构列 - Add new column with literal value to a struct column in Dataframe in Spark Scala 如何在 scala/python 中将计算列添加到 dataframe? - how to add a calculated column to a dataframe in scala/python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM