[英]How to add remarks column using scala
I have dataFrame
as below and want to add remarks using Scala 我有如下
dataFrame
,并想使用Scala添加备注
id val visits
111 2 1
111 2 1
112 4 2
112 5 4
113 6 1
Expected output should be below 预期输出应低于
id val visits remarks
111 2 1 Ramdom
111 2 1 Ramdom
112 4 2 Less visit
112 5 4 More visit
113 6 1 One visit
Remarks should be: 备注应为:
Ramdom for Id has two records with same value & visits Ramdom for Id拥有两条具有相同价值和访问次数的记录
One Visit for Id has only one record which contains any no of visits ID的一次访问只有一个记录,其中包含任何访问次数
Less Visit for Id has two records with less visits compared to other Id的减访问有两个记录用更少的访问相比其他
More Visit for Id has more than one record with different value and visits. “ Id的更多访问次数 ”具有多个具有不同价值和访问次数的记录。
May not be the best solution but it's a working one: 可能不是最好的解决方案,但这是一个可行的解决方案:
First group your dataFrame
by val
and visits
and the count of them 首先将您的
dataFrame
按val
和visits
及其计数进行分组
val grouped = df.groupBy("id").agg(max($"val").as("maxVal"), max($"visits").as("maxVisits"), min($"val").as("minVal"), min($"visits").as("minVisits"), count($"id").as("count"))
Then define a UDF
where you implement your logic: 然后定义一个实现您的逻辑的
UDF
:
val remarks = functions.udf ((value: Int, visits: Int, maxValue: Int, maxVisits: Int, minValue: Int, minVisits: Int, count: Int) =>
if (count == 1) {
"One Visit"
}else if (value == maxValue && value == minValue && visits == maxVisits && visits == minVisits) {
"Random"
}else {
if (visits < maxVisits) {
"Less Visits"
}else {
"More Visits"
}
}
)
Then join the original dataFrame
and the grouped one by id
and add the desired column with the UDF
. 然后将原始
dataFrame
和按id
分组的一组加入,并用UDF
添加所需的列。 Finally remove the undesired columns from the output: 最后,从输出中删除不需要的列:
df.join(grouped, Seq("id"))
.withColumn("remarks", remarks($"val", $"visits", $"maxVal", $"maxVisits", $"minVal", $"minVisits", $"count"))
.drop("maxVal","maxVisits", "minVal", "minVisits", "count")
Output: 输出:
+---+----+-------+-----------+
| id| val| visits| remarks|
+---+----+-------+-----------+
|112| 4| 2|Less Visits|
|112| 5| 4|More Visits|
|113| 6| 1| One Visit|
|111| 2| 1| Random|
|111| 2| 1| Random|
+---+----+-------+-----------+
PS remember to import functions PS记得导入功能
import org.apache.spark.sql.functions
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.