spark write: CSV 数据源不支持空数据类型

Question

I have an error in my code.我的代码中有错误。 The code is dumping some data into Redshift database.该代码正在将一些数据转储到 Redshift 数据库中。

After some investigation I found an easy way to reproduce it in spark console.经过一番调查，我找到了一种在 spark 控制台中重现它的简单方法。

This is working fine:这工作正常：

scala> Seq("France", "Germany").toDF.agg(avg(lit(null))).write.csv("1.csv")
scala>

But if I replace avg with max I got an error "CSV data source does not support null data type."但是，如果我将avg替换为max，则会出现错误“CSV 数据源不支持空数据类型”。

scala> Seq("France", "Germany").toDF.agg(max(lit(null))).write.csv("2.csv")
java.lang.UnsupportedOperationException: CSV data source does not support null data type.

What's wrong with max ? max 有什么问题？

Answer 1

The error is correct as AVG returns the DOUBLE datatype错误是正确的，因为 AVG 返回 DOUBLE 数据类型

Seq("France", "Germany").toDF.agg(avg(lit(null)).alias("col1")).printSchema

where as MAX returns the type as null其中 MAX 将类型返回为 null

Seq("France", "Germany").toDF.agg(max(lit(null)).alias("col1")).printSchema

so while you are writing the dataframe having MAX its throwing the error, if you want to save the dataframe with the max explicitely convert it into another type因此，当您编写具有 MAX 的数据帧时，它会抛出错误，如果您想使用 max 保存数据帧，请将其显式转换为另一种类型

Seq("France", "Germany").toDF.agg(max(lit(null)).alias("col1").cast(DoubleType)).write.csv("path")

spark write: CSV 数据源不支持空数据类型

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-09-14 18:57:05

spark write: CSV 数据源不支持空数据类型

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-09-14 18:57:05

解决方案1
3 已采纳 2018-09-14 18:57:05