简体   繁体   English

Java中的Spark自定义排序列

[英]Spark custom sort column in Java

I have a below result of Dataset. 我的数据集结果如下。

1. 1。

+------+---------+--------+
| Col1 |  Col2   | NumCol |
+------+---------+--------+
| abc  | jun2016 |     25 |
| aac  | jun2017 |     28 |
| aac  | dec2017 |     30 |
| aac  | apr2018 |     45 |
+------+---------+--------+

when sorting is applied I get below result. 应用排序时,我得到以下结果。

+------+---------+--------+
| Col1 |  Col2   | NumCol |
+------+---------+--------+
| aac  | apr2018 |     45 |
| aac  | dec2017 |     30 |
| aac  | jun2017 |     28 |
| abc  | jun2018 |     25 |
+------+---------+--------+

But instead should have been 但应该是

+------+---------+--------+
| Col1 |  Col2   | NumCol |
+------+---------+--------+
| aac  | jun2017 |     28 |
| aac  | dec2017 |     30 |
| aac  | apr2018 |     45 |
| abc  | jun2018 |     25 |
+------+---------+--------+

According to chronological order. 根据时间顺序。 How will I be able to achieve the same. 我将如何实现相同的目标。

2. When Week is present 2.当周出现时

When I have a column as Week as below 当我有一个列作为周如下

+------+-----------------------+--------+
| Col1 |         Week          | NumCol |
+------+-----------------------+--------+
| aac  | 02/04/2018-02/10/2018 |     45 |
| aac  | 02/11/2018-02/17/2018 |     25 |
| aac  | 01/28/2018-02/03/2018 |     30 |
+------+-----------------------+--------+

I want that to get sorted as below. 我希望将其排序如下。

+------+-----------------------+--------+
| Col1 |         Week          | NumCol |
+------+-----------------------+--------+
| aac  | 01/28/2018-02/03/2018 |     30 |
| aac  | 02/04/2018-02/10/2018 |     45 |
| aac  | 02/11/2018-02/17/2018 |     25 |
+------+-----------------------+--------+

Here above I want to parse date of the column week as new Column dateweek , then sort the Week column and delete before returning the dataset. 在上面,我想将列周的日期解析为新的列dateweek,然后对Week列进行排序并删除,然后返回数据集。

Kind of challenging stuff for me. 对我来说有点挑战。

for #1 I followed this But the issue with this is if there's suppose Jan2016,Feb2016,Jan2017 it gets sorted as Jan2016,Jan2017,Feb2016 . 对于#1,我遵循了这个问题,但是问题是如果假设Jan2016,Feb2016,Jan2017会被排序为Jan2016,Jan2017,Feb2016

Need help for 2 需要帮助2

split the week and sort based on date. 分割星期并根据日期排序。

import org.apache.spark.sql.functions.split
df.withColumn("_tmp", split($"Week", "-")).select($"Col1", $"Week", $"NumCCol1", $"_tmp".getItem(0).as("_sort")).sort("_sort").drop("_sort").show()

output- 输出 -

+----+---------------------+--------+
|Col1|Week                 |NumCCol1|
+----+---------------------+--------+
|aac |01/28/2018-02/03/2018|30      |
|aac |02/04/2018-02/10/2018|45      |
|aac |02/11/2018-02/17/2018|25      |
+----+---------------------+--------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM