![](/img/trans.png)
[英]AnalysisException: Cannot resolve column name “XYZ” among (_1,_2,_3)
[英]Flatten Nested schema in DataFrame, getting AnalysisException: cannot resolve column name
我有一个 DF:
-- str1: struct (nullable = true)
| |-- a1: string (nullable = true)
| |-- a2: string (nullable = true)
| |-- a3: string (nullable = true)
|-- str2: string (nullable = true)
|-- str3: string (nullable = true)
|-- str4: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- b1: string (nullable = true)
| | |-- b2: string (nullable = true)
| | |-- b3: boolean (nullable = true)
| | |-- b4: struct (nullable = true)
| | | |-- c1: integer (nullable = true)
| | | |-- c2: string (nullable = true)
| | | |-- c3: integer (nullable = true)
我正在尝试将其展平,为此我使用了以下代码:
def flattenSchema(schema: StructType, prefix: String = null):Array[Column]=
{
schema.fields.flatMap(f => {
val colName = if (prefix == null) f.name else (prefix + "." + f.name)
f.dataType match {
case st: StructType => flattenSchema(st, colName)
case at: ArrayType =>
val st = at.elementType.asInstanceOf[StructType]
flattenSchema(st, colName)
case _ => Array(new Column(colName).as(colName))
}
})
}
val d1 = df.select(flattenSchema(df.schema):_*)
它给了我下面的输出:
|-- str1.a1: string (nullable = true)
|-- str1.a2: string (nullable = true)
|-- str1.a3: string (nullable = true)
|-- str2: string (nullable = true)
|-- str3: string (nullable = true)
|-- str4.b1: array (nullable = true)
| |-- element: string (containsNull = true)
|-- str4.b2: array (nullable = true)
| |-- element: string (containsNull = true)
|-- str4.b3: array (nullable = true)
| |-- element: string (containsNull = true)
|-- str4.b4.c1: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- str4.b4.c2: array (nullable = true)
| |-- element: string (containsNull = true)
|-- str4.b4.c3: array (nullable = true)
| |-- element: integer (containsNull = true)
当我尝试查询时出现问题:
d1.select("str2").show
-- 它没有给我任何问题
但是当我查询任何展平的嵌套列时
d1.select("str1.a1")
错误:
org.apache.spark.sql.AnalysisException: cannot resolve '`str1.a1`' given input columns: ....
我在这里做错了什么? 或任何其他方式来达到预期的结果?
Spark 不支持带有dot(.) 的string
类型列名。 点用于访问任何struct
类型列的子列。 如果您尝试从数据帧df
访问同一列,那么它应该可以工作,因为在df
它是struct
类型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.