简体   繁体   中英

How to pair rows in SPARK dataframe based on timestamp range and row type

I have a dataframe similar to this:

+------------------+---------+------------+
|    Timestamp     | RowType |   Value    |
+------------------+---------+------------+
| 2020. 6. 5. 8:12 | X       | Null       |
| 2020. 6. 5. 8:13 | Y       | Null       |
| 2020. 6. 5. 8:14 | Y       | Null       |
| 2020. 6. 5. 8:15 | A       | SomeValue  |
| 2020. 6. 5. 8:16 | Y       | Null       |
| 2020. 6. 5. 8:17 | Y       | Null       |
| 2020. 6. 5. 8:18 | X       | Null       |
| 2020. 6. 5. 8:19 | Y       | Null       |
| 2020. 6. 5. 8:20 | Y       | Null       |
| 2020. 6. 6. 8:21 | A       | SomeValue2 |
| 2020. 6. 7. 8:22 | Y       | Null       |
| 2020. 6. 8. 8:23 | Y       | Null       |
| 2020. 6. 9. 8:24 | X       | Null       |
+------------------+---------+------------+

For each X typed row I want to select the value from the following A typed row. If there is no A typed row between two X typed, then the value of the X row should remain null.

+------------------+---------+------------+
|    Timestamp     | RowType |   Value    |
+------------------+---------+------------+
| 2020. 6. 5. 8:12 | X       | SomeValue  |
| 2020. 6. 5. 8:18 | X       | SomeValue2 |
| 2020. 6. 9. 8:24 | X       | Null       |
+------------------+---------+------------+

Is this possible using window functions?

If RowType contains only these values (X,Y,A) it should work:

 df.filter('RowType=!="Y")
   .select('Timestamp,'RowType,lag('Value,-1).over(Window.orderBy('Timestamp)).as("lag"))
   .filter('RowType==="X")
   .show()

output:

+----------------+-------+-----------+
|       Timestamp|RowType|        lag|
+----------------+-------+-----------+
|2020. 6. 5. 8:12|      X|SomeValue  |
|2020. 6. 5. 8:18|      X|SomeValue2 |
|2020. 6. 9. 8:24|      X|       null|
+----------------+-------+-----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM