枢轴旋转后，spark保留原始数据框的所有列

Question

I have one data frame which has many columns almost 50 plus(as shown below), 我有一个数据框，其中有许多列，几乎有50多个（如下所示），

+----+----+---+----+----+---+----+---+----+----+---+...
|c1  |c2  |c3 |c4  |c5  |c6  |c7 |c8 |type|clm |val |...
+----+----+---+----+----+---+----+---+----+----+---+...
|  11| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t1 | a  |5   |...
+----+----+---+----+----+---+----+---+----+----+---+...
|  31| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t2 | b  |6   |...
+----+----+---+----+----+---+----+---+----+----+---+...
|  11| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t1 | a  |9   |...
+----+----+---+----+----+---+----+---+----+----+---+...

I want to convert one of the column values to many columns, so thinking to use below code 我想将一个列值转换为很多列，所以想使用下面的代码

df.groupBy("type").pivot("clm").agg(first("val")).show()

this is converting row values in to columns but other columns (c1 to c8) are not coming as part resultant data frame. 这会将行值转换为列，但其他列（c1至c8）不会作为结果数据帧的一部分出现。

so is it okay to do below method to get all cloumns after pivot 所以可以做下面的方法来获取透视后的所有cloumns

df.groupBy("c1","c2","c3","c4","c5","c6","c7","c8","type").pivot("clm").agg(first("val")).show() df.groupBy（ “C1”， “C2”， “C3”， “C4”， “C5”， “C6”， “C7”， “C8”， “类型”）。枢轴（ “CLM”）。AGG（第一（ “VAL”））。节目（）

Answer 1

pivot is treated like an aggregator, just like any other. 像其他任何数据集一样，pivot被视为聚合器。

df
  .groupBy("type")
  .agg(
    pivot("clm").first("val"),
    first("c1"),
    first("c2"),
    first("c3"),
    first("c4"),
    first("c5"),
    first("c6"),
    first("c7"),
    first("c8")
  ).show()

Writing it like that assumes that you have duplicated values for c1..c8 within the same type . 这样写就假设您在同一type具有c1..c8重复值。 If not, then the .groupby(...) needs to be tuned for exactly how your data is organized. 如果不是，则需要对.groupby(...)进行调整，以精确地确定数据的组织方式。

枢轴旋转后，spark保留原始数据框的所有列

问题描述

1 个解决方案

解决方案1
0 2018-10-22 18:10:42

枢轴旋转后，spark保留原始数据框的所有列

问题描述

1 个解决方案

解决方案1 0 2018-10-22 18:10:42

解决方案1
0 2018-10-22 18:10:42