Scala 将列的十六进制 substring 转换为十进制 - Dataframe org.apache.spark.sql.catalyst.parser.ParseException：

Question

   val DF = Seq("310:120:fe5ab02").toDF("id")

+-----------------+
|       id        |
+-----------------+
| 310:120:fe5ab02 |
+-----------------+


+-----------------+-------------+--------+
|       id        |      id1    |   id2  |
+-----------------+-------------+--------+
| 310:120:fe5ab02 |      2      | 1041835| 
+-----------------+-------------+--------+

I need to convert two substrings of a string from a column from hexadecimal to decimal and create two new columns in Dataframe.我需要将一个字符串的两个子字符串从一个列从十六进制转换为十进制，并在 Dataframe 中创建两个新列。

id1->   310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(5) -> 02 -> ParseInt(x,16) ->  2
id2->   310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(0,5) -> fe5ab -> ParseInt(x,16) ->  1041835

From "310:120:fe5ab02" i need "fe5ab02" which i get by doing x.split(":")(2) and then i need two substrings "fe5ab" and "02" which i get by x.substring(0,5),x.substring(5) Then i need to convert them into Decimal which i get by Integer.parseInt(x,16)从“310:120:fe5ab02”我需要“fe5ab02”，我通过 x.split(“:”)(2) 得到，然后我需要两个子串“fe5ab”和“02”，我通过 x.substring( 0,5),x.substring(5) 然后我需要将它们转换成十进制，我通过 Integer.parseInt(x,16) 得到

These work good individually but i need them in a single withColumn statement like below这些单独工作很好，但我需要它们在一个 withColumn 语句中，如下所示

val DF1 = DF
.withColumn("id1", expr("""Integer.parseInt((id.split(":")(2)).substring(5), 16)"""))
.withColumn("id2", expr("""Integer.parseInt((id.split(":")(2)).substring(0, 5), 16)"""))

display(DF1)

I am getting a parsing exception.我收到一个解析异常。

Answer 1

case class SplitId(part1: Int, part2: Int)

def splitHex: (String => SplitId) = { s => {
    val str: String = s.split(":")(2)
    SplitId(Integer.parseInt(str.substring(5), 16), Integer.parseInt(str.substring(0,5), 16))
  }
}

import org.apache.spark.sql.functions.udf

val splitHexUDF = udf(splitHex)

df.withColumn("splitId", splitHexUDF(df("id"))).withColumn("id1", $"splitId.part1").withColumn("id2",  $"splitId.part2").drop($"splitId").show()
+---------------+---+-------+
|             id|id1|    id2|
+---------------+---+-------+
|310:120:fe5ab02|  2|1041835|
+---------------+---+-------+

Alternatively, you can use the below snippet without UDF或者，您可以使用不带 UDF 的以下代码片段

import org.apache.spark.sql.functions._

val df2 = df.withColumn("splitId", split($"id", ":")(2))
  .withColumn("id1", $"splitId".substr(lit(6), length($"splitId")-1).cast("int"))
  .withColumn("id2", conv(substring($"splitId", 0, 5), 16, 10).cast("int"))
  .drop($"splitId")

df2.printSchema
root
 |-- id: string (nullable = true)
 |-- id1: integer (nullable = true)
 |-- id2: integer (nullable = true)

df2.show()
+---------------+---+-------+
|             id|id1|    id2|
+---------------+---+-------+
|310:120:fe5ab02|  2|1041835|
+---------------+---+-------+

Scala 将列的十六进制 substring 转换为十进制 - Dataframe org.apache.spark.sql.catalyst.parser.ParseException：

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-21 23:56:32

Scala 将列的十六进制 substring 转换为十进制 - Dataframe org.apache.spark.sql.catalyst.parser.ParseException：

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-21 23:56:32

解决方案1
1 已采纳 2020-09-21 23:56:32