简体   繁体   English

Scala 将列的十六进制 substring 转换为十进制 - Dataframe org.apache.spark.sql.catalyst.parser.ParseException:

[英]Scala Converting hexadecimal substring of column to decimal - Dataframe org.apache.spark.sql.catalyst.parser.ParseException:

   val DF = Seq("310:120:fe5ab02").toDF("id")

+-----------------+
|       id        |
+-----------------+
| 310:120:fe5ab02 |
+-----------------+


+-----------------+-------------+--------+
|       id        |      id1    |   id2  |
+-----------------+-------------+--------+
| 310:120:fe5ab02 |      2      | 1041835| 
+-----------------+-------------+--------+

I need to convert two substrings of a string from a column from hexadecimal to decimal and create two new columns in Dataframe.我需要将一个字符串的两个子字符串从一个列从十六进制转换为十进制,并在 Dataframe 中创建两个新列。

id1->   310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(5) -> 02 -> ParseInt(x,16) ->  2
id2->   310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(0,5) -> fe5ab -> ParseInt(x,16) ->  1041835

From "310:120:fe5ab02" i need "fe5ab02" which i get by doing x.split(":")(2) and then i need two substrings "fe5ab" and "02" which i get by x.substring(0,5),x.substring(5) Then i need to convert them into Decimal which i get by Integer.parseInt(x,16)从“310:120:fe5ab02”我需要“fe5ab02”,我通过 x.split(“:”)(2) 得到,然后我需要两个子串“fe5ab”和“02”,我通过 x.substring( 0,5),x.substring(5) 然后我需要将它们转换成十进制,我通过 Integer.parseInt(x,16) 得到

These work good individually but i need them in a single withColumn statement like below这些单独工作很好,但我需要它们在一个 withColumn 语句中,如下所示

val DF1 = DF
.withColumn("id1", expr("""Integer.parseInt((id.split(":")(2)).substring(5), 16)"""))
.withColumn("id2", expr("""Integer.parseInt((id.split(":")(2)).substring(0, 5), 16)"""))

display(DF1)

I am getting a parsing exception.我收到一个解析异常。

case class SplitId(part1: Int, part2: Int)

def splitHex: (String => SplitId) = { s => {
    val str: String = s.split(":")(2)
    SplitId(Integer.parseInt(str.substring(5), 16), Integer.parseInt(str.substring(0,5), 16))
  }
}

import org.apache.spark.sql.functions.udf

val splitHexUDF = udf(splitHex)

df.withColumn("splitId", splitHexUDF(df("id"))).withColumn("id1", $"splitId.part1").withColumn("id2",  $"splitId.part2").drop($"splitId").show()
+---------------+---+-------+
|             id|id1|    id2|
+---------------+---+-------+
|310:120:fe5ab02|  2|1041835|
+---------------+---+-------+

Alternatively, you can use the below snippet without UDF或者,您可以使用不带 UDF 的以下代码片段

import org.apache.spark.sql.functions._

val df2 = df.withColumn("splitId", split($"id", ":")(2))
  .withColumn("id1", $"splitId".substr(lit(6), length($"splitId")-1).cast("int"))
  .withColumn("id2", conv(substring($"splitId", 0, 5), 16, 10).cast("int"))
  .drop($"splitId")

df2.printSchema
root
 |-- id: string (nullable = true)
 |-- id1: integer (nullable = true)
 |-- id2: integer (nullable = true)

df2.show()
+---------------+---+-------+
|             id|id1|    id2|
+---------------+---+-------+
|310:120:fe5ab02|  2|1041835|
+---------------+---+-------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 解析异常 - org.apache.spark.sql.catalyst.parser.ParseException: - Parsing Exception - org.apache.spark.sql.catalyst.parser.ParseException: org.apache.spark.sql.catalyst.parser.ParseException: 在 spark scala cassandra api 中 - org.apache.spark.sql.catalyst.parser.ParseException: in spark scala cassandra api Spark SQL 错误:org.apache.spark.sql.catalyst.parser.ParseException:外部输入'$'期待 - Spark SQL error : org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '$' expecting Spark:scala.MatchError(类org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) - Spark: scala.MatchError (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema 我们如何在 Pyspark 中使用 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser - How can we use import org.apache.spark.sql.catalyst.parser.CatalystSqlParser in Pyspark Spark SQL 将 Scala DataFrame 转换为列列表 - Spark SQL converting a scala DataFrame into a column list 线程“主” scala.MatchError:Map()中的异常(类org.apache.spark.sql.catalyst.util.CaseInsensitiveMap) - Exception in thread “main” scala.MatchError:Map() (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap) Scala.MatchError:[abc,cde,null,3](类org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)在Spark JSON中缺少字段 - scala.MatchError: [abc,cde,null,3] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) in Spark JSON with missing fields Spark 1.3.1 SQL库:scala.reflect.internal.MissingRequirementError:类org.apache.spark.sql.catalyst.ScalaReflection - Spark 1.3.1 SQL libriary: scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection 线程“主”中的异常 java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/StructFilters inspark scala application in intellij - Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/StructFilters in spark scala application in intellij
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM