如何使用Scala添加备注列

Question

I have dataFrame as below and want to add remarks using Scala 我有如下dataFrame ，并想使用Scala添加备注

id    val  visits 
111   2        1
111   2        1
112   4        2
112   5        4
113   6        1

Expected output should be below 预期输出应低于

id    val    visits   remarks
111   2        1      Ramdom
111   2        1      Ramdom
112   4        2      Less visit
112   5        4      More visit
113   6        1      One visit

Remarks should be: 备注应为：
Ramdom for Id has two records with same value & visits Ramdom for Id拥有两条具有相同价值和访问次数的记录
One Visit for Id has only one record which contains any no of visits ID的一次访问只有一个记录，其中包含任何访问次数
Less Visit for Id has two records with less visits compared to other Id的减访问有两个记录用更少的访问相比其他
More Visit for Id has more than one record with different value and visits. “ Id的更多访问次数 ”具有多个具有不同价值和访问次数的记录。

Answer 1

May not be the best solution but it's a working one: 可能不是最好的解决方案，但这是一个可行的解决方案：

First group your dataFrame by val and visits and the count of them 首先将您的dataFrame按val和visits及其计数进行分组

val grouped = df.groupBy("id").agg(max($"val").as("maxVal"), max($"visits").as("maxVisits"), min($"val").as("minVal"), min($"visits").as("minVisits"), count($"id").as("count"))

Then define a UDF where you implement your logic: 然后定义一个实现您的逻辑的UDF ：

val remarks = functions.udf ((value: Int, visits: Int, maxValue: Int, maxVisits: Int, minValue: Int, minVisits: Int, count: Int) =>
   if (count == 1) {
     "One Visit"
   }else if (value == maxValue && value == minValue && visits == maxVisits && visits == minVisits) {
     "Random"
   }else {
     if (visits < maxVisits) {
       "Less Visits"
     }else {
       "More Visits"
     }
   }
 )

Then join the original dataFrame and the grouped one by id and add the desired column with the UDF . 然后将原始dataFrame和按id分组的一组加入，并用UDF添加所需的列。 Finally remove the undesired columns from the output: 最后，从输出中删除不需要的列：

 df.join(grouped, Seq("id"))
   .withColumn("remarks", remarks($"val", $"visits", $"maxVal", $"maxVisits", $"minVal", $"minVisits", $"count"))
   .drop("maxVal","maxVisits", "minVal", "minVisits", "count")

Output: 输出：

+---+----+-------+-----------+
| id| val| visits|    remarks|
+---+----+-------+-----------+
|112|   4|      2|Less Visits|
|112|   5|      4|More Visits|
|113|   6|      1|  One Visit|
|111|   2|      1|     Random|
|111|   2|      1|     Random|
+---+----+-------+-----------+

PS remember to import functions PS记得导入功能

import org.apache.spark.sql.functions

如何使用Scala添加备注列

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-02-07 12:37:28

如何使用Scala添加备注列

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-02-07 12:37:28

解决方案1
2 已采纳 2018-02-07 12:37:28