[英]Scala Converting hexadecimal substring of column to decimal - Dataframe org.apache.spark.sql.catalyst.parser.ParseException:
val DF = Seq("310:120:fe5ab02").toDF("id")
+-----------------+
| id |
+-----------------+
| 310:120:fe5ab02 |
+-----------------+
+-----------------+-------------+--------+
| id | id1 | id2 |
+-----------------+-------------+--------+
| 310:120:fe5ab02 | 2 | 1041835|
+-----------------+-------------+--------+
I need to convert two substrings of a string from a column from hexadecimal to decimal and create two new columns in Dataframe.我需要将一个字符串的两个子字符串从一个列从十六进制转换为十进制,并在 Dataframe 中创建两个新列。
id1-> 310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(5) -> 02 -> ParseInt(x,16) -> 2
id2-> 310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(0,5) -> fe5ab -> ParseInt(x,16) -> 1041835
From "310:120:fe5ab02" i need "fe5ab02" which i get by doing x.split(":")(2) and then i need two substrings "fe5ab" and "02" which i get by x.substring(0,5),x.substring(5) Then i need to convert them into Decimal which i get by Integer.parseInt(x,16)从“310:120:fe5ab02”我需要“fe5ab02”,我通过 x.split(“:”)(2) 得到,然后我需要两个子串“fe5ab”和“02”,我通过 x.substring( 0,5),x.substring(5) 然后我需要将它们转换成十进制,我通过 Integer.parseInt(x,16) 得到
These work good individually but i need them in a single withColumn statement like below这些单独工作很好,但我需要它们在一个 withColumn 语句中,如下所示
val DF1 = DF
.withColumn("id1", expr("""Integer.parseInt((id.split(":")(2)).substring(5), 16)"""))
.withColumn("id2", expr("""Integer.parseInt((id.split(":")(2)).substring(0, 5), 16)"""))
display(DF1)
I am getting a parsing exception.我收到一个解析异常。
case class SplitId(part1: Int, part2: Int)
def splitHex: (String => SplitId) = { s => {
val str: String = s.split(":")(2)
SplitId(Integer.parseInt(str.substring(5), 16), Integer.parseInt(str.substring(0,5), 16))
}
}
import org.apache.spark.sql.functions.udf
val splitHexUDF = udf(splitHex)
df.withColumn("splitId", splitHexUDF(df("id"))).withColumn("id1", $"splitId.part1").withColumn("id2", $"splitId.part2").drop($"splitId").show()
+---------------+---+-------+
| id|id1| id2|
+---------------+---+-------+
|310:120:fe5ab02| 2|1041835|
+---------------+---+-------+
Alternatively, you can use the below snippet without UDF或者,您可以使用不带 UDF 的以下代码片段
import org.apache.spark.sql.functions._
val df2 = df.withColumn("splitId", split($"id", ":")(2))
.withColumn("id1", $"splitId".substr(lit(6), length($"splitId")-1).cast("int"))
.withColumn("id2", conv(substring($"splitId", 0, 5), 16, 10).cast("int"))
.drop($"splitId")
df2.printSchema
root
|-- id: string (nullable = true)
|-- id1: integer (nullable = true)
|-- id2: integer (nullable = true)
df2.show()
+---------------+---+-------+
| id|id1| id2|
+---------------+---+-------+
|310:120:fe5ab02| 2|1041835|
+---------------+---+-------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.