[英]Spark custom sort column in Java
I have a below result of Dataset. 我的数据集结果如下。
+------+---------+--------+
| Col1 | Col2 | NumCol |
+------+---------+--------+
| abc | jun2016 | 25 |
| aac | jun2017 | 28 |
| aac | dec2017 | 30 |
| aac | apr2018 | 45 |
+------+---------+--------+
when sorting is applied I get below result. 应用排序时,我得到以下结果。
+------+---------+--------+
| Col1 | Col2 | NumCol |
+------+---------+--------+
| aac | apr2018 | 45 |
| aac | dec2017 | 30 |
| aac | jun2017 | 28 |
| abc | jun2018 | 25 |
+------+---------+--------+
But instead should have been 但应该是
+------+---------+--------+
| Col1 | Col2 | NumCol |
+------+---------+--------+
| aac | jun2017 | 28 |
| aac | dec2017 | 30 |
| aac | apr2018 | 45 |
| abc | jun2018 | 25 |
+------+---------+--------+
According to chronological order. 根据时间顺序。 How will I be able to achieve the same.
我将如何实现相同的目标。
When I have a column as Week as below 当我有一个列作为周如下
+------+-----------------------+--------+
| Col1 | Week | NumCol |
+------+-----------------------+--------+
| aac | 02/04/2018-02/10/2018 | 45 |
| aac | 02/11/2018-02/17/2018 | 25 |
| aac | 01/28/2018-02/03/2018 | 30 |
+------+-----------------------+--------+
I want that to get sorted as below. 我希望将其排序如下。
+------+-----------------------+--------+
| Col1 | Week | NumCol |
+------+-----------------------+--------+
| aac | 01/28/2018-02/03/2018 | 30 |
| aac | 02/04/2018-02/10/2018 | 45 |
| aac | 02/11/2018-02/17/2018 | 25 |
+------+-----------------------+--------+
Here above I want to parse date of the column week as new Column dateweek , then sort the Week column and delete before returning the dataset. 在上面,我想将列周的日期解析为新的列dateweek,然后对Week列进行排序并删除,然后返回数据集。
Kind of challenging stuff for me. 对我来说有点挑战。
for #1 I followed this But the issue with this is if there's suppose Jan2016,Feb2016,Jan2017 it gets sorted as Jan2016,Jan2017,Feb2016 . 对于#1,我遵循了这个问题,但是问题是如果假设Jan2016,Feb2016,Jan2017会被排序为Jan2016,Jan2017,Feb2016 。
Need help for 2 需要帮助2
split the week and sort based on date. 分割星期并根据日期排序。
import org.apache.spark.sql.functions.split
df.withColumn("_tmp", split($"Week", "-")).select($"Col1", $"Week", $"NumCCol1", $"_tmp".getItem(0).as("_sort")).sort("_sort").drop("_sort").show()
output- 输出 -
+----+---------------------+--------+
|Col1|Week |NumCCol1|
+----+---------------------+--------+
|aac |01/28/2018-02/03/2018|30 |
|aac |02/04/2018-02/10/2018|45 |
|aac |02/11/2018-02/17/2018|25 |
+----+---------------------+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.