简体   繁体   English

Apache Spark中DataFrame的更新架构

[英]Update Schema for DataFrame in Apache Spark

I have a DataFrame with the following schema 我有一个具有以下架构的DataFrame

root
 |-- col_a: string (nullable = false)
 |-- col_b: string (nullable = false)
 |-- col_c_a: string (nullable = false)
 |-- col_c_b: string (nullable = false)
 |-- col_d: string (nullable = false)
 |-- col_e: string (nullable = false)
 |-- col_f: string (nullable = false)

now I want to convert the Schema for this data frame to something like this. 现在,我想将此数据帧的架构转换为这样的形式。

root
 |-- col_a: string (nullable = false)
 |-- col_b: string (nullable = false)
 |-- col_c: struct (nullable = false)
     |-- col_c_a: string (nullable = false)
     |-- col_c_b: string (nullable = false)
 |-- col_d: string (nullable = false)
 |-- col_e: string (nullable = false)
 |-- col_f: string (nullable = false)

I can able to do this with the help of map transformation by explicitly fetching the value of each column from row type but this is very complex process and does not look good So, 通过从row类型中显式获取每一列的值,我可以借助map转换来做到这一点,但这是一个非常复杂的过程,看起来并不好。

is there any way I can achieve this? 有什么办法可以做到这一点?

Thanks 谢谢

There is an in-built struct function with the definition : 有一个内置的struct函数,其定义为:

def struct(cols: Column*): Column

You can use it like : 您可以像这样使用它:

df.show
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  2|  3|
+---+---+

df.withColumn("struct_col", struct($"a", $"b")).show
+---+---+----------+
|  a|  b|struct_col|
+---+---+----------+
|  1|  2|     [1,2]|
|  2|  3|     [2,3]|
+---+---+----------+

The schema of the new dataframe being : 新数据框的架构为:

 |-- a: integer (nullable = false)
 |-- b: integer (nullable = false)
 |-- struct_col: struct (nullable = false)
 |    |-- a: integer (nullable = false)
 |    |-- b: integer (nullable = false)

In you case, you can do something like : 在这种情况下,您可以执行以下操作:

df.withColumn("col_c" , struct($"col_c_a", $"col_c_b") ).drop($"col_c_a").drop($"col_c_b")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM