簡體   English   中英

更新 Spark Scala 中的列

[英]Update Column in Spark Scala

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1| G K|  0  |
|  2| L_L|  1  |
|  3|null|  1  |
+---+----+-----+
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1| GK|  0  |
|  2| LL|  1  |
|  3|null|  1  |
+---+----+-----+

我只想用刪除了下划線和空格的新值更新部門。 可以嗎?

scala> val inputDf = Seq((1,"G K","0 "), (2,"L_L","1"), (3,null,"  1")).toDF("sno","dept","color")
inputDf: org.apache.spark.sql.DataFrame = [sno: int, dept: string ... 1 more field]

scala> inputDf.show
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1| G K|   0 |
|  2| L_L|    1|
|  3|null|    1|
+---+----+-----+

問:我只想用刪除了下划線和空格的新值更新部門。 這可能嗎?

是的...

inputDf.withColumn("dept",regexp_replace('dept , "_" ,"")) // replace underscore with empty string
.withColumn("dept",regexp_replace('dept , " " ,"")) // replace space with empty string
.withColumn("color", trim('color)).show // if you want to trim which has extra space
.show

結果:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  GK|    0|
|  2|  LL|    1|
|  3|null|    1|
+---+----+-----+

或者

更聰明的方法

1) \s|_僅用於空格和下划線。

2)使用下划線或刪除任何非字母數字使用正則表達式\W|_

val inputDf = Seq((1, "G K", "0 "), (2, "L_L", "1"), (3, null, "1")).toDF("sno", "dept", "color")
  inputDf.show

   inputDf.withColumn("dept", regexp_replace('dept, """\s|_""", "")).show

結果:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  GK|   0 |
|  2| L_L|    1|
|  3|null|    1|
+---+----+-----+

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  GK|   0 |
|  2|  LL|    1|
|  3|null|    1|
+---+----+-----+

我希望這正是您正在尋找的。

您可以為此使用regexp_replacetrim udf,如下所示

import org.apache.spark.sql.functions._

object SampleDF {

  def main(args: Array[String]): Unit = {

    val spark = Constant.getSparkSess
    import spark.implicits._
    val inputDf = Seq((1,"G K","0 "),
      (2,"L-L","1"),
        (3,null,"  1")).toDF("sno","dept","color")

    inputDf
      .withColumn("dept",regexp_replace($"dept"," |-",""))
      .withColumn("color",trim($"color"))
      .show()
  }

}


暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM