從火花的 dataframe 列中的字母數字值中刪除字母

Question

dataframe兩列的樣子。

SKU   | COMPSKU

PT25M | PT10M
PT3H  | PT20M
TH    | QR12
S18M  | JH

火花與 scala

我怎樣才能刪除所有字母並且只保留數字..

預期 output：

Answer 1

你也可以這樣做。

df.withColumn(
    "SKU",
    when(regexp_replace(col("SKU"),"[a-zA-Z]","")==="",0
        ).otherwise(regexp_replace(col("SKU"),"[a-zA-Z]","")) 
).withColumn(
    "COMPSKU",
    when(regexp_replace(col("COMPSKU"),"[a-zA-Z]","")==="", 0
        ).otherwise(regexp_replace(col("COMPSKU"),"[a-zA-Z]",""))
).show()
/*
        +-----+-------+
        |  SKU|COMPSKU|
        +-----+-------+
        |  25 |  10   |
        |   3 |  20   |
        |   0 |  12   |
        |  18 |   0   |
        +-----+-------+
*/

Answer 2

嘗試使用regexp_replace function 然后使用 case when otherwise stateme用 0 替換空值。

Example:

df.show()
/*
+-----+-------+
|  SKU|COMPSKU|
+-----+-------+
|PT25M|  PT10M|
| PT3H|  PT20M|
|   TH|   QR12|
| S18M|     JH|
+-----+-------+
*/

df.withColumn("SKU",regexp_replace(col("SKU"),"[a-zA-Z]","")).
withColumn("COMPSKU",regexp_replace(col("COMPSKU"),"[a-zA-Z]","")).
withColumn("SKU",when(length(trim(col("SKU")))===0,lit(0)).otherwise(col("SKU"))).
withColumn("COMPSKU",when(length(trim(col("COMPSKU")))===0,lit(0)).otherwise(col("COMPSKU"))).
show()

/*
+---+-------+
|SKU|COMPSKU|
+---+-------+
| 25|     10|
|  3|     20|
|  0|     12|
| 18|      0|
+---+-------+
*/

從火花的 dataframe 列中的字母數字值中刪除字母

問題描述

2 個解決方案

解決方案1
1 2020-08-18 21:15:28

解決方案2
0 已采納 2020-08-18 20:40:16

從火花的 dataframe 列中的字母數字值中刪除字母

問題描述

2 個解決方案

解決方案1 1 2020-08-18 21:15:28

解決方案2 0 已采納 2020-08-18 20:40:16

解決方案1
1 2020-08-18 21:15:28

解決方案2
0 已采納 2020-08-18 20:40:16