简体   繁体   中英

How to flatten a data frame in apache spark | Scala

I have the following data frame :

df1

uid  text  frequency
1    a     1
1    b     0
1    c     2
2    a     0
2    b     0
2    c     1

I need to flatten it on the basis of uid to :

df2

uid  a  b  c
1    1  0  2
2    0  0  1

I've worked on similar lines in R but haven't been able to translate it into sql or scala.

Any suggestions on how to approach this?

You can group by uid , use text as a pivot column and sum frequencies:

   df1
     .groupBy("uid")
     .pivot("text")
     .sum("frequency").show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM