简体   繁体   English

从 scala DataFrame 的单行中的不同列中查找最大值

[英]Find max value from different columns in a single row in scala DataFrame

I tried to find out the max value from different columns in a single row in scala dataframe.我试图从 scala dataframe 的单行中的不同列中找出最大值。

The data available in dataframe is as below. dataframe 中可用的数据如下。

+-------+---------------------------------------+---------------------------------------+---------------------------------------+
|    NUM|                                   SIG1|                                   SIG2|                                   SIG3|
+-------+---------------------------------------+---------------------------------------+---------------------------------------+
|XXXXX01|[{"TIME":1569560531000,"VALUE":3.7825}]|[{"TIME":1569560531001,"VALUE":4.7825}]|[{"TIME":1569560531002,"VALUE":2.7825}]|
|XXXXX01|[{"TIME":1569560541001,"VALUE":1.7825}]|[{"TIME":1569560541000,"VALUE":8.7825}]|[{"TIME":1569560541003,"VALUE":5.7825}]|
|XXXXX01|[{"TIME":1569560531000,"VALUE":3.7825}]|[{"TIME":1569560531009,"VALUE":3.7825}]|        null                           |
|XXXXX02|[{"TIME":1569560531000,"VALUE":5.7825}]|[{"TIME":1569560531007,"VALUE":8.7825}]|[{"TIME":1569560531006,"VALUE":3.7825}]|
|XXXXX02|[{"TIME":1569560531000,"VALUE":9.7825}]|[{"TIME":1569560531009,"VALUE":1.7825}]|[{"TIME":1569560531010,"VALUE":3.7825}]|

and the schema is架构是

scala> DF.printSchema
root
 |-- NUM: string (nullable = true)
 |-- SIG1: string (nullable = true)
 |-- SIG2: string (nullable = true)
 |-- SIG3: string (nullable = true)

The expected output is as below.预期的 output 如下。


+-------+--------------+----------+------------+------------+
|    NUM|      TIME    | SIG1|    |  SIG2      |  SIG3      |
+-------+--------------+----------+------------+------------+
|XXXXX01| 1569560531002| 3.7825   | 4.7825     | 2.7825     |
|XXXXX01| 1569560541003| 1.7825   | 8.7825     | 5.7825     |
|XXXXX01| 1569560531009| 3.7825   | 3.7825     | null       |
|XXXXX02| 1569560531007| 5.7825   | 8.7825     | 3.7825     |
|XXXXX02| 1569560531010| 9.7825   | 1.7825     | 3.7825     |

I need to add a new column with highest TIME from a single row and SIG columns with their value only.我需要从单行和 SIG 列中添加一个具有最高 TIME 的新列,仅包含它们的值。

Basically the TIME in each column will be replaced by the highest TIME value available in that row and explode the TIME and VALUEs.基本上,每列中的 TIME 将被该行中可用的最高 TIME 值替换,并分解 TIME 和 VALUE。

Is there any UDF/functions to achieve this?是否有任何UDF/功能来实现这一点? Thanks in Advance.提前致谢。

Use get_json_object function to extract values from json stored as a string.使用get_json_object function 从存储为字符串的 json 中提取值。

The it's quite straightforward:这很简单:

DF.withColumn("TIME", greatest(get_json_object('SIG1, "$[0].TIME"),
                               get_json_object('SIG2, "$[0].TIME"),
                               get_json_object('SIG3, "$[0].TIME")))
  .withColumn("SIG1", get_json_object('SIG1, "$[0].VALUE"))
  .withColumn("SIG2", get_json_object('SIG2, "$[0].VALUE"))
  .withColumn("SIG3", get_json_object('SIG3, "$[0].VALUE"))
  .show

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scala Dataframe获取特定行的最大值 - Scala Dataframe get max value of specific row Scala - Spark在Dataframe中检索,具有最大值的行,列名称 - Scala - Spark In Dataframe retrieve, for row, column name with have max value Scala - 将 Dataframe 转换为 json 文件,并为不同的列从列名和列值中生成键 - Scala - Convert Dataframe into json file and make key from column name and column value both for different different columns Scala Spark DataFrame 问题:如何通过将当前行中的值与前一行中的某处匹配来添加新列 - Scala Spark DataFrame Question:How to add new columns by matching the value in current row to somewhere from previous rows 根据不同 Dataframe 的间隔在 Dataframe 中找到最大值 - Find the max value within a Dataframe based on an interval of a different Dataframe 使用动态列数更改数据框行值 - change a dataframe row value with dynamic number of columns spark scala 从火花数据帧中的不同 ROW 获取值 - Fetching value from a different ROW in a spark dataframe Scala如何在列表中找到整数的最大值[行] - Scala how to find the max of Intgers in a List[Row] 将 Scala 数据框列组合成单个案例类 - combine scala dataframe columns into single case class 使用 Scala 根据前一行中不同列的计算值计算 Spark Dataframe 当前行中的列值 - Calculating column value in current row of Spark Dataframe based on the calculated value of a different column in previous row using Scala
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM