繁体   English   中英

如何在火花 scala dataframe 的旋转列中获取值

[英]how to have values in pivoted columns of spark scala dataframe

我有 dataframe 的值

+---+-----------------------+----------+---------+------------+---------+
|id |database_name          |users     |groups   |type        |isAllowed|
+---+-----------------------+----------+---------+------------+---------+
|73 |[ww_hr_dl_highsecure]  |[hive]    |[hrhs]   |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |update      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |create      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |drop        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |alter       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |index       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |lock        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |all         |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |read        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |write       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |repladmin   |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |serviceadmin|true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |select      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |update      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |create      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |drop        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |alter       |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |all         |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |read        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |write       |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|select      |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|read        |true     |
+---+-----------------------+----------+---------+------------+---------+

我想根据类型列 pivot 我的 dataframe 。 所以我需要的结果 dataframe 就像

id db_name                 users    group select update create  drop  alter
73 ww_hr_dl_highsecure     hive     hrhs   true    null    null  null  null
73 ww_hr_dl_highsecure     svchrdat null   true    true    true  true  true
84 ww_core_dim_dl_tables   svc02001 true   true    true    true  true  true

我不知道如何将新列中的值作为原始 dataframe 中 isAllowed 列的值进行旋转。

到目前为止我所做的是

val dfs3 = dfs2.groupBy("database_name","users").pivot("type").expr("isAllowed")

将组添加到 groupBy 是可接受的结果吗?

df.groupBy("database_name","users","groups").pivot("type").agg(first("isAllowed")).show(false)

Output:

+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|database_name          |users     |groups   |all |alter|create|drop|index|lock|read|repladmin|select|serviceadmin|update|write|
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|[ww_core_dim_dl_tables]|[svc02001]|[]       |true|true |true  |true|null |null|true|null     |true  |null        |true  |true |
|[ww_hr_dl_highsecure]  |[svchrdat]|[]       |true|true |true  |true|true |true|true|true     |true  |true        |true  |true |
|[ww_hr_dl_highsecure]  |[hive]    |[hrhe]   |null|null |null  |null|null |null|null|null     |true  |null        |null  |null |
|[ww_core_dim_dl_tables]|[]        |[walmart]|null|null |null  |null|null |null|true|null     |true  |null        |null  |null |
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM