如何在火花 scala dataframe 的旋转列中获取值

Question

我有 dataframe 的值

+---+-----------------------+----------+---------+------------+---------+
|id |database_name          |users     |groups   |type        |isAllowed|
+---+-----------------------+----------+---------+------------+---------+
|73 |[ww_hr_dl_highsecure]  |[hive]    |[hrhs]   |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |update      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |create      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |drop        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |alter       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |index       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |lock        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |all         |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |read        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |write       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |repladmin   |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |serviceadmin|true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |select      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |update      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |create      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |drop        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |alter       |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |all         |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |read        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |write       |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|select      |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|read        |true     |
+---+-----------------------+----------+---------+------------+---------+

我想根据类型列 pivot 我的 dataframe 。 所以我需要的结果 dataframe 就像

id db_name                 users    group select update create  drop  alter
73 ww_hr_dl_highsecure     hive     hrhs   true    null    null  null  null
73 ww_hr_dl_highsecure     svchrdat null   true    true    true  true  true
84 ww_core_dim_dl_tables   svc02001 true   true    true    true  true  true

我不知道如何将新列中的值作为原始 dataframe 中 isAllowed 列的值进行旋转。

到目前为止我所做的是

val dfs3 = dfs2.groupBy("database_name","users").pivot("type").expr("isAllowed")

Answer 1

将组添加到 groupBy 是可接受的结果吗？

df.groupBy("database_name","users","groups").pivot("type").agg(first("isAllowed")).show(false)

Output：

+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|database_name          |users     |groups   |all |alter|create|drop|index|lock|read|repladmin|select|serviceadmin|update|write|
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|[ww_core_dim_dl_tables]|[svc02001]|[]       |true|true |true  |true|null |null|true|null     |true  |null        |true  |true |
|[ww_hr_dl_highsecure]  |[svchrdat]|[]       |true|true |true  |true|true |true|true|true     |true  |true        |true  |true |
|[ww_hr_dl_highsecure]  |[hive]    |[hrhe]   |null|null |null  |null|null |null|null|null     |true  |null        |null  |null |
|[ww_core_dim_dl_tables]|[]        |[walmart]|null|null |null  |null|null |null|true|null     |true  |null        |null  |null |
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+

如何在火花 scala dataframe 的旋转列中获取值

问题描述

1 个解决方案

解决方案1
0 2019-11-27 22:05:40

如何在火花 scala dataframe 的旋转列中获取值

问题描述

1 个解决方案

解决方案1 0 2019-11-27 22:05:40

解决方案1
0 2019-11-27 22:05:40